πŸ“Œ Professional Data Analysis Jupyter Notebook


πŸ“Œ Step 1: Title & Introduction

πŸ“Œ Step 2: Importing Necessary Libraries

πŸ“Œ Step 3: Loading & Understanding the Dataset

πŸ“Œ Step 4: Data Summary & Initial Cleaning

πŸ“Œ Step 5: Handling Missing Values

πŸ“Œ Step 6: Identifying & Removing Duplicates

πŸ“Œ Step 7: Checking for Inconsistent Categorical Values

πŸ“Œ Step 8: Data Type Conversion & Column Renaming

πŸ“Œ Step 9: Detecting & Handling Outliers

πŸ“Œ Step 10: Exploratory Data Analysis (EDA)

πŸ“Œ Step 11: Data Visualization & Key Insights

πŸ“Œ Step 12: Exporting the Cleaned Dataset

πŸ“Œ Step 13: COVID-19 Data Analysis Report

Summary of Key Findings & Insights



πŸ“Œ Step 1: Title & Introduction


COVID-19 Data Analysis using Python πŸ“Š

Author: Muhammad Danish Azeem

Email:

Dataset Source: COVID-19 Dataset https://www.kaggle.com/datasets/meirnizri/covid19-dataset)

πŸ“Œ Project Overview

This project analyzes the COVID-19 dataset to uncover insights into global trends, case spikes, mortality rates, and vaccination progress.
We perform data preprocessing, exploratory analysis (EDA), and visualization to understand key patterns.

πŸ“Œ Step 2: Importing Necessary Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from scipy.stats import zscore
import math
from scipy.stats.mstats import winsorize

πŸ“Œ Step 3: Loading & Understanding the Dataset


from google.colab import drive
drive.mount('/content/drive')
df = pd.read_csv('/content/drive/My Drive/Data Sets/covid-data.csv')
Mounted at /content/drive

Note: Some the output of notebook does not present the complete output, therefore we can increase the limit of columns view and row view by using these commands:

pd.set_option('display.max_columns', None) # this is to display all the columns in the dataframe
pd.set_option('display.max_rows', None) # this is to display all the rows in the dataframe
# hide all warnings runtime
import warnings
warnings.filterwarnings('ignore')

πŸ“Œ Step 4: Data Summary & Initial Cleaning

# Display the first few rows
df.head()
iso_code continent location date total_cases new_cases new_cases_smoothed total_deaths new_deaths new_deaths_smoothed total_cases_per_million new_cases_per_million new_cases_smoothed_per_million total_deaths_per_million new_deaths_per_million new_deaths_smoothed_per_million reproduction_rate icu_patients icu_patients_per_million hosp_patients hosp_patients_per_million weekly_icu_admissions weekly_icu_admissions_per_million weekly_hosp_admissions weekly_hosp_admissions_per_million total_tests new_tests total_tests_per_thousand new_tests_per_thousand new_tests_smoothed new_tests_smoothed_per_thousand positive_rate tests_per_case tests_units total_vaccinations people_vaccinated people_fully_vaccinated total_boosters new_vaccinations new_vaccinations_smoothed total_vaccinations_per_hundred people_vaccinated_per_hundred people_fully_vaccinated_per_hundred total_boosters_per_hundred new_vaccinations_smoothed_per_million new_people_vaccinated_smoothed new_people_vaccinated_smoothed_per_hundred stringency_index population_density median_age aged_65_older aged_70_older gdp_per_capita extreme_poverty cardiovasc_death_rate diabetes_prevalence female_smokers male_smokers handwashing_facilities hospital_beds_per_thousand life_expectancy human_development_index population excess_mortality_cumulative_absolute excess_mortality_cumulative excess_mortality excess_mortality_cumulative_per_million
0 AFG Asia Afghanistan 2020-01-03 NaN 0.0 NaN NaN 0.0 NaN NaN 0.0 NaN NaN 0.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 54.422 18.6 2.581 1.337 1803.987 NaN 597.029 9.59 NaN NaN 37.746 0.5 64.83 0.511 41128772.0 NaN NaN NaN NaN
1 AFG Asia Afghanistan 2020-01-04 NaN 0.0 NaN NaN 0.0 NaN NaN 0.0 NaN NaN 0.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 54.422 18.6 2.581 1.337 1803.987 NaN 597.029 9.59 NaN NaN 37.746 0.5 64.83 0.511 41128772.0 NaN NaN NaN NaN
2 AFG Asia Afghanistan 2020-01-05 NaN 0.0 NaN NaN 0.0 NaN NaN 0.0 NaN NaN 0.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 54.422 18.6 2.581 1.337 1803.987 NaN 597.029 9.59 NaN NaN 37.746 0.5 64.83 0.511 41128772.0 NaN NaN NaN NaN
3 AFG Asia Afghanistan 2020-01-06 NaN 0.0 NaN NaN 0.0 NaN NaN 0.0 NaN NaN 0.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 54.422 18.6 2.581 1.337 1803.987 NaN 597.029 9.59 NaN NaN 37.746 0.5 64.83 0.511 41128772.0 NaN NaN NaN NaN
4 AFG Asia Afghanistan 2020-01-07 NaN 0.0 NaN NaN 0.0 NaN NaN 0.0 NaN NaN 0.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 54.422 18.6 2.581 1.337 1803.987 NaN 597.029 9.59 NaN NaN 37.746 0.5 64.83 0.511 41128772.0 NaN NaN NaN NaN
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 302512 entries, 0 to 302511
Data columns (total 67 columns):
 #   Column                                      Non-Null Count   Dtype  
---  ------                                      --------------   -----  
 0   iso_code                                    302512 non-null  object 
 1   continent                                   288160 non-null  object 
 2   location                                    302512 non-null  object 
 3   date                                        302512 non-null  object 
 4   total_cases                                 266771 non-null  float64
 5   new_cases                                   294064 non-null  float64
 6   new_cases_smoothed                          292800 non-null  float64
 7   total_deaths                                246214 non-null  float64
 8   new_deaths                                  294139 non-null  float64
 9   new_deaths_smoothed                         292909 non-null  float64
 10  total_cases_per_million                     266771 non-null  float64
 11  new_cases_per_million                       294064 non-null  float64
 12  new_cases_smoothed_per_million              292800 non-null  float64
 13  total_deaths_per_million                    246214 non-null  float64
 14  new_deaths_per_million                      294139 non-null  float64
 15  new_deaths_smoothed_per_million             292909 non-null  float64
 16  reproduction_rate                           184817 non-null  float64
 17  icu_patients                                34764 non-null   float64
 18  icu_patients_per_million                    34764 non-null   float64
 19  hosp_patients                               35138 non-null   float64
 20  hosp_patients_per_million                   35138 non-null   float64
 21  weekly_icu_admissions                       9101 non-null    float64
 22  weekly_icu_admissions_per_million           9101 non-null    float64
 23  weekly_hosp_admissions                      21287 non-null   float64
 24  weekly_hosp_admissions_per_million          21287 non-null   float64
 25  total_tests                                 79387 non-null   float64
 26  new_tests                                   75403 non-null   float64
 27  total_tests_per_thousand                    79387 non-null   float64
 28  new_tests_per_thousand                      75403 non-null   float64
 29  new_tests_smoothed                          103965 non-null  float64
 30  new_tests_smoothed_per_thousand             103965 non-null  float64
 31  positive_rate                               95927 non-null   float64
 32  tests_per_case                              94348 non-null   float64
 33  tests_units                                 106788 non-null  object 
 34  total_vaccinations                          73561 non-null   float64
 35  people_vaccinated                           70411 non-null   float64
 36  people_fully_vaccinated                     68149 non-null   float64
 37  total_boosters                              42324 non-null   float64
 38  new_vaccinations                            60542 non-null   float64
 39  new_vaccinations_smoothed                   163536 non-null  float64
 40  total_vaccinations_per_hundred              73561 non-null   float64
 41  people_vaccinated_per_hundred               70411 non-null   float64
 42  people_fully_vaccinated_per_hundred         68149 non-null   float64
 43  total_boosters_per_hundred                  42324 non-null   float64
 44  new_vaccinations_smoothed_per_million       163536 non-null  float64
 45  new_people_vaccinated_smoothed              163587 non-null  float64
 46  new_people_vaccinated_smoothed_per_hundred  163587 non-null  float64
 47  stringency_index                            193194 non-null  float64
 48  population_density                          256703 non-null  float64
 49  median_age                                  238751 non-null  float64
 50  aged_65_older                               230391 non-null  float64
 51  aged_70_older                               236359 non-null  float64
 52  gdp_per_capita                              233979 non-null  float64
 53  extreme_poverty                             150700 non-null  float64
 54  cardiovasc_death_rate                       234406 non-null  float64
 55  diabetes_prevalence                         246348 non-null  float64
 56  female_smokers                              175815 non-null  float64
 57  male_smokers                                173423 non-null  float64
 58  handwashing_facilities                      114817 non-null  float64
 59  hospital_beds_per_thousand                  206911 non-null  float64
 60  life_expectancy                             278219 non-null  float64
 61  human_development_index                     227212 non-null  float64
 62  population                                  302512 non-null  float64
 63  excess_mortality_cumulative_absolute        10295 non-null   float64
 64  excess_mortality_cumulative                 10295 non-null   float64
 65  excess_mortality                            10295 non-null   float64
 66  excess_mortality_cumulative_per_million     10295 non-null   float64
dtypes: float64(62), object(5)
memory usage: 154.6+ MB

# Observations

  1. There are 302512 rows and 67 columns in the dataset
  2. The columns are of different data types
  3. The columns in the datasets are:
    • 'iso_code', 'continent', 'location', 'date', 'total_cases', 'new_cases', 'new_cases_smoothed', 'total_deaths', 'new_deaths', 'new_deaths_smoothed', 'total_cases_per_million', 'new_cases_per_million', 'new_cases_smoothed_per_million', 'total_deaths_per_million', 'new_deaths_per_million', 'new_deaths_smoothed_per_million', 'reproduction_rate', 'icu_patients', 'icu_patients_per_million', 'hosp_patients', 'hosp_patients_per_million', 'weekly_icu_admissions', 'weekly_icu_admissions_per_million', 'weekly_hosp_admissions', 'weekly_hosp_admissions_per_million', 'total_tests', 'new_tests', 'total_tests_per_thousand', 'new_tests_per_thousand', 'new_tests_smoothed', 'new_tests_smoothed_per_thousand', 'positive_rate', 'tests_per_case', 'tests_units', 'total_vaccinations', 'people_vaccinated', 'people_fully_vaccinated', 'total_boosters', 'new_vaccinations', 'new_vaccinations_smoothed', 'total_vaccinations_per_hundred', 'people_vaccinated_per_hundred', 'people_fully_vaccinated_per_hundred', 'total_boosters_per_hundred', 'new_vaccinations_smoothed_per_million', 'new_people_vaccinated_smoothed', 'new_people_vaccinated_smoothed_per_hundred', 'stringency_index', 'population_density', 'median_age', 'aged_65_older', 'aged_70_older', 'gdp_per_capita', 'extreme_poverty', 'cardiovasc_death_rate', 'diabetes_prevalence', 'female_smokers', 'male_smokers', 'handwashing_facilities', 'hospital_beds_per_thousand', 'life_expectancy', 'human_development_index', 'population', 'excess_mortality_cumulative_absolute', 'excess_mortality_cumulative', 'excess_mortality', 'excess_mortality_cumulative_per_million'
  4. There are some missing values in the dataset which we will read in details and deal later on in the notebook.

df.sample(50)
iso_code continent location date total_cases new_cases new_cases_smoothed total_deaths new_deaths new_deaths_smoothed total_cases_per_million new_cases_per_million new_cases_smoothed_per_million total_deaths_per_million new_deaths_per_million new_deaths_smoothed_per_million reproduction_rate icu_patients icu_patients_per_million hosp_patients hosp_patients_per_million weekly_icu_admissions weekly_icu_admissions_per_million weekly_hosp_admissions weekly_hosp_admissions_per_million total_tests new_tests total_tests_per_thousand new_tests_per_thousand new_tests_smoothed new_tests_smoothed_per_thousand positive_rate tests_per_case tests_units total_vaccinations people_vaccinated people_fully_vaccinated total_boosters new_vaccinations new_vaccinations_smoothed total_vaccinations_per_hundred people_vaccinated_per_hundred people_fully_vaccinated_per_hundred total_boosters_per_hundred new_vaccinations_smoothed_per_million new_people_vaccinated_smoothed new_people_vaccinated_smoothed_per_hundred stringency_index population_density median_age aged_65_older aged_70_older gdp_per_capita extreme_poverty cardiovasc_death_rate diabetes_prevalence female_smokers male_smokers handwashing_facilities hospital_beds_per_thousand life_expectancy human_development_index population excess_mortality_cumulative_absolute excess_mortality_cumulative excess_mortality excess_mortality_cumulative_per_million
82258 SWZ Africa Eswatini 2022-10-11 73436.0 0.0 3.714 1422.0 0.0 0.000 61111.111 0.000 3.091 1183.343 0.000 0.000 0.11 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 0.0 NaN NaN NaN NaN 0.0 198.0 0.016 23.15 79.492 21.5 3.163 1.845 7738.975 NaN 333.436 3.94 1.700 16.500 24.097 2.100 60.19 0.611 1.201680e+06 NaN NaN NaN NaN
31128 BOL South America Bolivia 2020-02-02 NaN 0.0 0.000 NaN 0.0 0.000 NaN 0.000 0.000 NaN 0.000 0.000 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 0.00 10.202 25.4 6.704 4.393 6885.829 7.1 204.299 6.89 NaN NaN 25.383 1.100 71.51 0.718 1.222411e+07 NaN NaN NaN NaN
95446 GAB Africa Gabon 2022-11-12 48959.0 0.0 0.000 306.0 0.0 0.000 20493.538 0.000 0.000 128.087 0.000 0.000 0.10 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 102.0 NaN NaN NaN NaN 43.0 7.0 0.000 11.11 7.859 23.1 4.450 2.976 16562.413 3.4 259.967 7.20 NaN NaN NaN 6.300 66.47 0.703 2.388997e+06 NaN NaN NaN NaN
211644 PER South America Peru 2022-11-10 4163326.0 1557.0 822.429 217103.0 5.0 12.857 122272.434 45.727 24.154 6376.083 0.147 0.378 1.45 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 8.547293e+07 3.004870e+07 2.831865e+07 27105585.0 28141.0 21892.0 251.02 88.25 83.17 79.61 643.0 4200.0 0.012 11.11 25.129 29.1 7.151 4.455 12236.706 3.5 85.755 5.95 4.800 NaN NaN 1.600 76.74 0.777 3.404959e+07 NaN NaN NaN NaN
194617 PRK Asia North Korea 2020-07-10 NaN 0.0 0.000 NaN 0.0 0.000 NaN 0.000 0.000 NaN 0.000 0.000 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 29.0 0.001 NaN NaN samples tested NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 211.701 35.3 9.491 6.139 NaN NaN 321.681 4.00 NaN NaN NaN 13.200 72.27 NaN 2.606942e+07 NaN NaN NaN NaN
78145 GNQ Africa Equatorial Guinea 2021-05-04 7694.0 0.0 19.286 112.0 0.0 0.714 4593.663 0.000 11.514 66.869 0.000 0.426 0.47 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 4766.0 NaN NaN NaN NaN 2846.0 3239.0 0.193 NaN 45.194 22.4 2.846 1.752 22604.873 NaN 202.812 7.78 NaN NaN 24.640 2.100 58.74 0.592 1.674916e+06 NaN NaN NaN NaN
250127 ZAF Africa South Africa 2020-05-12 10652.0 637.0 490.286 206.0 12.0 9.714 177.848 10.635 8.186 3.439 0.200 0.162 1.44 70.0 1.169 434.0 7.246 NaN NaN NaN NaN 369697.0 13630.0 6.225 0.229 14519.0 0.244 0.0372 26.9 people tested NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 84.26 46.754 27.3 5.344 3.053 12294.876 18.9 200.380 5.52 8.100 33.200 43.993 2.320 64.13 0.709 5.989388e+07 NaN NaN NaN NaN
197311 OWID_CYN Asia Northern Cyprus 2022-05-20 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 655.0 NaN NaN NaN NaN 1711.0 46.0 0.012 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 3.828360e+05 NaN NaN NaN NaN
278666 TCA North America Turks and Caicos Islands 2023-03-23 6565.0 0.0 0.429 38.0 0.0 0.000 143572.585 0.000 9.373 831.037 0.000 0.000 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 37.312 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 80.22 NaN 4.572600e+04 NaN NaN NaN NaN
235960 SAU Asia Saudi Arabia 2020-09-01 315772.0 951.0 1016.857 3897.0 27.0 29.429 8672.952 26.120 27.929 107.034 0.742 0.808 0.83 NaN NaN NaN NaN NaN NaN NaN NaN 5393167.0 55801.0 150.017 1.552 54944.0 1.528 0.0179 55.7 tests performed NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 60.19 15.322 31.9 3.295 1.845 49045.411 NaN 259.538 17.72 1.800 25.400 NaN 2.700 75.13 0.854 3.640882e+07 NaN NaN NaN NaN
142214 LAO Asia Laos 2020-01-06 NaN 0.0 NaN NaN 0.0 NaN NaN 0.000 NaN NaN 0.000 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 29.715 24.4 4.029 2.322 6397.360 22.7 368.111 4.00 7.300 51.200 49.839 1.500 67.92 0.613 7.529477e+06 NaN NaN NaN NaN
260989 CHE Europe Switzerland 2020-08-19 38873.0 294.0 231.000 1762.0 0.0 0.714 4447.472 33.637 26.429 201.591 0.000 0.082 1.19 33.0 3.776 123.0 14.072 NaN NaN 57.0 6.521 540718.0 9388.0 62.213 1.080 7136.0 0.821 0.0360 27.8 tests performed NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 43.06 214.243 43.1 18.436 12.644 57410.166 NaN 99.739 5.59 22.600 28.900 NaN 4.530 83.78 0.955 8.740471e+06 NaN NaN NaN NaN
42232 BDI Africa Burundi 2021-01-07 885.0 22.0 9.571 2.0 0.0 0.000 68.660 1.707 0.743 0.155 0.000 0.000 1.18 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 11.11 423.062 17.5 2.562 1.504 702.225 71.7 293.068 6.05 NaN NaN 6.144 0.800 61.58 0.433 1.288958e+07 NaN NaN NaN NaN
47644 CPV Africa Cape Verde 2022-09-27 62368.0 8.0 1.714 410.0 0.0 0.000 105144.969 13.487 2.890 691.211 0.000 0.000 0.82 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 19.44 135.580 25.7 4.460 3.437 6222.554 NaN 182.219 2.42 2.100 16.500 NaN 2.100 72.98 0.665 5.931620e+05 NaN NaN NaN NaN
290731 VUT Oceania Vanuatu 2020-03-27 NaN 0.0 0.000 NaN 0.0 0.000 NaN 0.000 0.000 NaN 0.000 0.000 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 83.33 22.662 23.1 4.394 2.620 2921.909 13.2 546.300 12.02 2.800 34.500 25.209 NaN 70.47 0.609 3.267440e+05 NaN NaN NaN NaN
158060 MWI Africa Malawi 2021-12-13 62265.0 35.0 40.571 2308.0 1.0 0.143 3051.410 1.715 1.988 113.108 0.049 0.007 2.50 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1138.0 0.057 0.0481 20.8 tests performed 1.560093e+06 1.276252e+06 6.349090e+05 NaN 6746.0 11361.0 7.65 6.25 3.11 NaN 557.0 11121.0 0.055 39.81 197.519 18.1 2.979 1.783 1095.042 71.4 227.349 3.94 4.400 24.700 8.704 1.300 64.26 0.483 2.040532e+07 NaN NaN NaN NaN
106844 GUM Oceania Guam 2021-04-29 7733.0 12.0 7.714 136.0 0.0 0.000 45016.096 69.856 44.907 791.697 0.000 0.000 NaN NaN NaN NaN NaN NaN NaN NaN NaN 111205.0 147.0 652.099 0.862 1046.0 6.134 0.0060 166.7 tests performed NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 304.128 31.4 9.551 5.493 NaN NaN 310.496 21.52 NaN NaN NaN NaN 80.07 NaN 1.717830e+05 NaN NaN NaN NaN
83681 OWID_EUR NaN Europe 2020-02-15 86.0 1.0 2.857 2.0 2.0 0.286 0.115 0.001 0.004 0.003 0.003 0.000 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 7.448078e+08 NaN NaN NaN NaN
25798 BEL Europe Belgium 2021-11-13 1479519.0 5053.0 9680.143 26470.0 32.0 26.143 126932.805 433.514 830.491 2270.948 2.745 2.243 1.40 489.0 41.953 2407.0 206.504 NaN NaN 1567.0 134.438 22979081.0 88197.0 1979.007 7.596 80767.0 6.956 0.1370 7.3 tests performed 1.794216e+07 8.820783e+06 8.676959e+06 864824.0 21596.0 23213.0 153.93 75.68 74.44 7.42 1992.0 3579.0 0.031 31.92 375.564 41.8 18.571 12.849 42658.576 0.2 114.898 4.29 25.100 31.400 NaN 5.640 81.63 0.931 1.165592e+07 NaN NaN NaN NaN
258743 SUR South America Suriname 2021-01-12 7008.0 60.0 87.857 133.0 1.0 1.429 11338.962 97.080 142.153 215.194 1.618 2.311 1.10 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 39.0 0.064 NaN NaN tests performed NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 67.59 3.612 29.6 6.933 4.229 13767.119 NaN 258.314 12.54 7.400 42.900 67.779 3.100 71.68 0.738 6.180460e+05 NaN NaN NaN NaN
196431 MKD Europe North Macedonia 2022-03-20 303354.0 214.0 266.143 9184.0 8.0 4.571 144895.458 102.216 127.122 4386.690 3.821 2.184 0.86 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 2134.0 1.015 0.1247 8.0 tests performed 1.836421e+06 NaN 8.351510e+05 148686.0 NaN 485.0 87.72 NaN 39.89 7.10 232.0 14.0 0.001 NaN 82.600 39.1 13.260 8.160 13111.214 5.0 322.688 10.08 NaN NaN NaN 4.280 75.80 0.774 2.093606e+06 NaN NaN NaN NaN
37900 VGB North America British Virgin Islands 2022-04-04 6141.0 0.0 5.286 62.0 0.0 0.000 195997.702 0.000 168.700 1978.808 0.000 0.000 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 11.0 NaN NaN NaN NaN 351.0 4.0 0.013 NaN 207.973 NaN NaN NaN NaN NaN NaN 13.67 NaN NaN NaN NaN 79.07 NaN 3.133200e+04 NaN NaN NaN NaN
232443 WSM Oceania Samoa 2020-11-11 NaN 0.0 0.000 NaN 0.0 0.000 NaN 0.000 0.000 NaN 0.000 0.000 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 69.413 22.0 5.606 3.564 6021.557 NaN 348.977 9.21 16.700 38.100 NaN NaN 73.32 0.715 2.223900e+05 NaN NaN NaN NaN
279616 TUV Oceania Tuvalu 2022-07-20 8.0 0.0 0.000 NaN 0.0 0.000 705.779 0.000 0.000 NaN 0.000 0.000 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 220.0 NaN NaN NaN NaN 19409.0 5.0 0.044 NaN 373.067 NaN NaN NaN 3575.104 3.3 NaN 27.25 NaN NaN NaN NaN 67.57 NaN 1.133500e+04 NaN NaN NaN NaN
12009 ARM Asia Armenia 2020-02-19 NaN 0.0 0.000 NaN 0.0 0.000 NaN 0.000 0.000 NaN 0.000 0.000 NaN NaN NaN NaN NaN NaN NaN NaN NaN 73.0 5.0 0.026 0.002 1.0 0.000 0.0000 NaN tests performed NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 102.931 35.7 11.232 7.571 8787.580 1.8 341.010 7.11 1.500 52.100 94.043 4.200 75.09 0.776 2.780472e+06 NaN NaN NaN NaN
35803 BWA Africa Botswana 2023-01-24 329214.0 0.0 45.143 2792.0 0.0 0.286 125162.149 0.000 17.163 1061.476 0.000 0.109 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1444.0 NaN NaN NaN NaN 549.0 307.0 0.012 NaN 4.044 25.8 3.941 2.242 15807.374 NaN 237.372 4.81 5.700 34.400 NaN 1.800 69.59 0.735 2.630300e+06 NaN NaN NaN NaN
64200 CUW North America Curacao 2022-03-23 39853.0 293.0 41.857 265.0 0.0 0.000 208465.631 1532.643 218.949 1386.179 0.000 0.000 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 13.0 0.068 NaN NaN tests performed 2.462320e+05 1.074000e+05 9.844100e+04 40391.0 79.0 60.0 128.80 56.18 51.49 21.13 314.0 12.0 0.006 NaN 362.644 41.7 16.367 10.068 NaN NaN NaN 11.62 NaN NaN NaN NaN 78.88 NaN 1.911730e+05 NaN NaN NaN NaN
79444 ERI Africa Eritrea 2021-08-15 6601.0 1.0 3.571 37.0 1.0 0.286 1791.782 0.271 0.969 10.043 0.271 0.078 0.80 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 44.304 19.3 3.607 2.171 1510.459 NaN 311.110 6.05 0.200 11.400 NaN 0.700 66.32 0.459 3.684041e+06 NaN NaN NaN NaN
20160 BHS North America Bahamas 2022-10-21 37342.0 0.0 1.714 833.0 0.0 0.000 91080.492 0.000 4.181 2031.762 0.000 0.000 0.50 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 3.629680e+05 1.735290e+05 1.653170e+05 35557.0 NaN 41.0 88.53 42.33 40.32 8.67 100.0 15.0 0.004 20.37 39.497 34.3 8.996 5.200 27717.847 NaN 235.954 13.17 3.100 20.400 NaN 2.900 73.92 0.814 4.099890e+05 NaN NaN NaN NaN
227361 LCA North America Saint Lucia 2020-01-18 NaN 0.0 0.000 NaN 0.0 0.000 NaN 0.000 0.000 NaN 0.000 0.000 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 293.187 34.9 9.721 6.405 12951.839 NaN 204.620 11.62 NaN NaN 87.202 1.300 76.20 0.759 1.798720e+05 NaN NaN NaN NaN
112117 GNB Africa Guinea-Bissau 2022-08-31 8796.0 0.0 43.571 175.0 0.0 0.000 4177.471 0.000 20.693 83.112 0.000 0.000 0.01 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 8.0 NaN NaN NaN NaN 4.0 8.0 0.000 NaN 66.191 19.4 3.002 1.565 1548.675 67.1 382.474 2.42 NaN NaN 6.403 NaN 58.32 0.480 2.105580e+06 NaN NaN NaN NaN
40681 BFA Africa Burkina Faso 2020-01-18 NaN 0.0 0.000 NaN 0.0 0.000 NaN 0.000 0.000 NaN 0.000 0.000 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 70.151 17.6 2.409 1.358 1703.102 43.7 269.048 2.42 1.600 23.900 11.877 0.400 61.58 0.452 2.267376e+07 NaN NaN NaN NaN
137703 KIR Oceania Kiribati 2020-10-05 NaN 0.0 0.000 NaN 0.0 0.000 NaN 0.000 0.000 NaN 0.000 0.000 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 22.22 143.701 23.2 3.895 2.210 1981.132 NaN 434.657 22.66 35.900 58.900 NaN 1.900 68.37 0.630 1.312370e+05 NaN NaN NaN NaN
97401 GEO Asia Georgia 2021-09-01 553697.0 3886.0 3664.857 7482.0 74.0 76.143 147873.950 1037.821 978.761 1998.192 19.763 20.335 0.86 NaN NaN NaN NaN NaN NaN NaN NaN 6491731.0 52162.0 1727.452 13.880 39147.0 10.417 0.0936 10.7 tests performed 1.235386e+06 8.136010e+05 4.217850e+05 NaN 24640.0 24220.0 32.99 21.73 11.26 NaN 6468.0 11356.0 0.303 50.93 65.032 38.7 14.864 10.244 9745.079 4.2 496.218 7.11 5.300 55.500 NaN 2.600 73.77 0.812 3.744385e+06 NaN NaN NaN NaN
12727 ARM Asia Armenia 2022-02-06 389957.0 2467.0 3360.571 8086.0 5.0 5.714 140248.490 887.259 1208.633 2908.139 1.798 2.055 1.27 NaN NaN NaN NaN NaN NaN NaN NaN 2784379.0 4803.0 997.637 1.721 8043.0 2.882 0.4230 2.4 tests performed 1.925556e+06 1.054178e+06 8.589810e+05 12397.0 NaN 6020.0 69.25 37.91 30.89 0.45 2165.0 3133.0 0.113 NaN 102.931 35.7 11.232 7.571 8787.580 1.8 341.010 7.11 1.500 52.100 94.043 4.200 75.09 0.776 2.780472e+06 NaN NaN NaN NaN
11129 ARG South America Argentina 2020-12-31 1674319.0 6969.0 8183.000 48271.0 113.0 150.000 36789.872 153.130 179.805 1060.660 2.483 3.296 1.21 NaN NaN NaN NaN NaN NaN NaN NaN 5126379.0 34858.0 113.223 0.770 34273.0 0.757 0.2170 4.6 tests performed 4.340000e+04 4.339200e+04 7.000000e+00 1.0 2806.0 11454.0 0.10 0.10 0.00 0.00 252.0 11451.0 0.025 79.17 16.177 31.9 11.198 7.441 18933.907 0.6 191.032 5.50 16.200 27.700 NaN 5.000 76.67 0.845 4.551032e+07 36108.2000 10.57 19.65 801.76245
114051 HTI North America Haiti 2021-05-30 15045.0 114.0 131.000 321.0 0.0 2.714 1298.662 9.840 11.308 27.708 0.000 0.234 1.27 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 50.93 398.448 24.3 4.800 2.954 1653.173 23.5 430.548 6.65 2.900 23.100 22.863 0.700 64.00 0.510 1.158500e+07 NaN NaN NaN NaN
52250 CHL South America Chile 2022-04-02 3476914.0 5978.0 6000.857 56637.0 57.0 59.286 177359.764 304.942 306.108 2889.092 2.908 3.024 0.62 546.0 27.852 NaN NaN 128.0 6.529 668.0 34.075 35407400.0 71064.0 1816.399 3.646 67343.0 3.455 0.0831 12.0 tests performed 5.087079e+07 1.788439e+07 1.740085e+07 16160143.0 8895.0 42747.0 259.50 91.23 88.76 82.43 2181.0 1719.0 0.009 28.78 24.282 35.4 11.087 6.938 22767.037 1.3 127.993 8.46 34.200 41.500 NaN 2.110 80.18 0.851 1.960374e+07 NaN NaN NaN NaN
217718 PRI North America Puerto Rico 2023-02-12 1090881.0 799.0 609.857 5750.0 10.0 5.429 335406.769 245.664 187.509 1767.919 3.075 1.669 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 376.232 38.2 15.168 9.829 35044.670 NaN 108.094 12.90 NaN NaN NaN NaN 80.10 NaN 3.252412e+06 10050.1980 10.59 26.10 3082.58500
7194 AGO Africa Angola 2020-01-21 NaN 0.0 0.000 NaN 0.0 0.000 NaN 0.000 0.000 NaN 0.000 0.000 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 23.890 16.8 2.405 1.362 5819.495 NaN 276.045 3.94 NaN NaN 26.664 NaN 61.15 0.581 3.558900e+07 NaN NaN NaN NaN
215875 PRT Europe Portugal 2021-05-06 824281.0 553.0 527.286 16974.0 2.0 1.429 80254.355 53.842 51.338 1652.637 0.195 0.139 0.95 77.0 7.497 283.0 27.554 NaN NaN NaN NaN 10839340.0 48583.0 1053.375 4.721 45025.0 4.376 0.0077 129.1 tests performed NaN NaN NaN NaN NaN 82363.0 NaN NaN NaN NaN 8019.0 54695.0 0.533 72.22 112.371 46.2 21.502 14.924 27936.896 0.5 127.842 9.85 16.300 30.000 NaN 3.390 82.05 0.864 1.027086e+07 NaN NaN NaN NaN
161470 MLI Africa Mali 2021-06-18 14364.0 5.0 5.000 523.0 0.0 0.286 635.755 0.221 0.221 23.148 0.000 0.013 0.95 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1873.0 NaN NaN NaN NaN 83.0 1550.0 0.007 44.44 15.196 16.4 2.519 1.486 2014.306 NaN 268.024 2.42 1.600 23.000 52.232 0.100 59.31 0.434 2.259360e+07 NaN NaN NaN NaN
183801 NPL Asia Nepal 2020-05-19 402.0 27.0 26.429 2.0 0.0 0.286 13.160 0.884 0.865 0.065 0.000 0.009 1.66 NaN NaN NaN NaN NaN NaN NaN NaN 33006.0 2282.0 1.099 0.076 2006.0 0.067 0.0130 76.9 samples tested NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 92.59 204.430 25.0 5.809 3.212 2442.804 15.0 260.797 7.26 9.500 37.800 47.782 0.300 70.78 0.602 3.054759e+07 NaN NaN NaN NaN
99607 GHA Africa Ghana 2021-02-27 84023.0 409.0 466.286 607.0 11.0 3.571 2509.957 12.218 13.929 18.132 0.329 0.107 1.02 NaN NaN NaN NaN NaN NaN NaN NaN 906827.0 4014.0 27.619 0.122 4284.0 0.130 0.0977 10.2 tests performed NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 44.44 126.719 21.1 3.385 1.948 4227.630 12.0 298.245 4.97 0.300 7.700 41.047 0.900 64.07 0.611 3.347587e+07 NaN NaN NaN NaN
290560 UZB Asia Uzbekistan 2023-01-16 250261.0 17.0 31.714 1637.0 0.0 0.000 7227.202 0.491 0.916 47.274 0.000 0.000 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 22772.0 NaN NaN NaN NaN 658.0 1105.0 0.003 NaN 76.134 28.2 4.469 2.873 6253.104 NaN 724.417 7.57 1.300 24.700 NaN 4.000 71.72 0.720 3.462765e+07 NaN NaN NaN NaN
247062 SVN Europe Slovenia 2021-10-17 308254.0 632.0 954.143 5031.0 4.0 4.857 145413.599 298.135 450.101 2373.289 1.887 2.291 1.27 121.0 57.080 411.0 193.882 56.0 26.576 260.0 122.439 1651831.0 1357.0 779.382 0.640 4305.0 2.031 0.2270 4.4 tests performed 2.167418e+06 1.178841e+06 1.097277e+06 31999.0 39.0 4572.0 102.24 55.61 51.76 1.51 2157.0 673.0 0.032 42.78 102.619 44.5 19.062 12.930 31400.840 NaN 153.493 7.25 20.100 25.000 NaN 4.500 81.32 0.917 2.119843e+06 3769.6003 9.94 13.22 1778.61000
136719 KEN Africa Kenya 2021-05-05 160904.0 345.0 487.429 2805.0 24.0 20.000 2978.188 6.386 9.022 51.918 0.444 0.370 0.79 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 3991.0 0.075 0.1098 9.1 tests performed NaN NaN NaN NaN NaN 5159.0 NaN NaN NaN NaN 95.0 5159.0 0.010 74.07 87.324 20.0 2.686 1.528 2993.028 36.8 218.637 2.92 1.200 20.400 24.651 1.400 66.70 0.601 5.402748e+07 NaN NaN NaN NaN
283332 ARE Asia United Arab Emirates 2022-11-25 1043390.0 224.0 216.571 2348.0 0.0 0.000 110515.279 23.726 22.939 248.699 0.000 0.000 0.85 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 13.89 112.442 34.0 1.144 0.526 67293.483 NaN 317.840 17.26 1.200 37.400 NaN 1.200 77.97 0.890 9.441138e+06 NaN NaN NaN NaN
298468 OWID_WRL NaN World 2022-01-12 314730790.0 3580417.0 2675469.000 5527310.0 7971.0 6946.714 39464.156 448.949 335.478 693.070 0.999 0.871 1.29 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 9.573635e+09 4.680114e+09 3.999404e+09 830002800.0 35773408.0 34878426.0 120.04 58.68 50.15 10.41 4373.0 9733993.0 0.122 NaN 58.045 30.9 8.696 5.355 15469.207 10.0 233.070 8.51 6.434 34.635 60.130 2.705 72.58 0.737 7.975105e+09 NaN NaN NaN NaN
211755 PER South America Peru 2023-03-01 4485753.0 69.0 134.714 219431.0 2.0 11.429 131741.770 2.026 3.956 6444.454 0.059 0.336 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 8.805009e+07 3.036484e+07 2.855090e+07 29134346.0 18820.0 20898.0 258.59 89.18 83.85 85.56 614.0 2214.0 0.007 NaN 25.129 29.1 7.151 4.455 12236.706 3.5 85.755 5.95 4.800 NaN NaN 1.600 76.74 0.777 3.404959e+07 NaN NaN NaN NaN
df.columns
Index(['iso_code', 'continent', 'location', 'date', 'total_cases', 'new_cases',
       'new_cases_smoothed', 'total_deaths', 'new_deaths',
       'new_deaths_smoothed', 'total_cases_per_million',
       'new_cases_per_million', 'new_cases_smoothed_per_million',
       'total_deaths_per_million', 'new_deaths_per_million',
       'new_deaths_smoothed_per_million', 'reproduction_rate', 'icu_patients',
       'icu_patients_per_million', 'hosp_patients',
       'hosp_patients_per_million', 'weekly_icu_admissions',
       'weekly_icu_admissions_per_million', 'weekly_hosp_admissions',
       'weekly_hosp_admissions_per_million', 'total_tests', 'new_tests',
       'total_tests_per_thousand', 'new_tests_per_thousand',
       'new_tests_smoothed', 'new_tests_smoothed_per_thousand',
       'positive_rate', 'tests_per_case', 'tests_units', 'total_vaccinations',
       'people_vaccinated', 'people_fully_vaccinated', 'total_boosters',
       'new_vaccinations', 'new_vaccinations_smoothed',
       'total_vaccinations_per_hundred', 'people_vaccinated_per_hundred',
       'people_fully_vaccinated_per_hundred', 'total_boosters_per_hundred',
       'new_vaccinations_smoothed_per_million',
       'new_people_vaccinated_smoothed',
       'new_people_vaccinated_smoothed_per_hundred', 'stringency_index',
       'population_density', 'median_age', 'aged_65_older', 'aged_70_older',
       'gdp_per_capita', 'extreme_poverty', 'cardiovasc_death_rate',
       'diabetes_prevalence', 'female_smokers', 'male_smokers',
       'handwashing_facilities', 'hospital_beds_per_thousand',
       'life_expectancy', 'human_development_index', 'population',
       'excess_mortality_cumulative_absolute', 'excess_mortality_cumulative',
       'excess_mortality', 'excess_mortality_cumulative_per_million'],
      dtype='object')
# Descriptive statistics
df.describe(include='all')
iso_code continent location date total_cases new_cases new_cases_smoothed total_deaths new_deaths new_deaths_smoothed total_cases_per_million new_cases_per_million new_cases_smoothed_per_million total_deaths_per_million new_deaths_per_million new_deaths_smoothed_per_million reproduction_rate icu_patients icu_patients_per_million hosp_patients hosp_patients_per_million weekly_icu_admissions weekly_icu_admissions_per_million weekly_hosp_admissions weekly_hosp_admissions_per_million total_tests new_tests total_tests_per_thousand new_tests_per_thousand new_tests_smoothed new_tests_smoothed_per_thousand positive_rate tests_per_case tests_units total_vaccinations people_vaccinated people_fully_vaccinated total_boosters new_vaccinations new_vaccinations_smoothed total_vaccinations_per_hundred people_vaccinated_per_hundred people_fully_vaccinated_per_hundred total_boosters_per_hundred new_vaccinations_smoothed_per_million new_people_vaccinated_smoothed new_people_vaccinated_smoothed_per_hundred stringency_index population_density median_age aged_65_older aged_70_older gdp_per_capita extreme_poverty cardiovasc_death_rate diabetes_prevalence female_smokers male_smokers handwashing_facilities hospital_beds_per_thousand life_expectancy human_development_index population excess_mortality_cumulative_absolute excess_mortality_cumulative excess_mortality excess_mortality_cumulative_per_million
count 302512 288160 302512 302512 2.667710e+05 2.940640e+05 2.928000e+05 2.462140e+05 294139.000000 292909.000000 266771.000000 294064.000000 292800.000000 246214.000000 294139.000000 292909.000000 184817.000000 34764.000000 34764.000000 35138.000000 35138.00000 9101.000000 9101.000000 21287.00000 21287.000000 7.938700e+04 7.540300e+04 79387.000000 75403.000000 1.039650e+05 103965.000000 95927.000000 9.434800e+04 106788 7.356100e+04 7.041100e+04 6.814900e+04 4.232400e+04 6.054200e+04 1.635360e+05 73561.000000 70411.000000 68149.000000 42324.000000 163536.000000 1.635870e+05 163587.000000 193194.000000 256703.000000 238751.000000 230391.000000 236359.000000 233979.000000 150700.000000 234406.000000 246348.000000 175815.000000 173423.000000 114817.000000 206911.000000 278219.000000 227212.000000 3.025120e+05 1.029500e+04 10295.000000 10295.000000 10295.000000
unique 255 6 255 1198 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 4 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
top ARG Africa Argentina 2022-04-20 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN tests performed NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
freq 1198 68173 1198 255 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 80099 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
mean NaN NaN NaN NaN 5.525632e+06 1.100018e+04 1.104556e+04 7.890977e+04 97.976117 98.374032 84395.533693 165.822440 166.511964 793.821846 1.040459 1.044517 0.911495 722.296657 17.189731 4240.497581 143.08819 376.183936 11.395732 4580.91732 92.829572 2.110457e+07 6.728541e+04 924.254762 3.272466 1.421784e+05 2.826309 0.098163 2.403633e+03 NaN 3.631105e+08 1.624038e+08 1.437811e+08 8.740611e+07 8.565426e+05 3.334718e+05 113.179679 50.438996 45.558880 32.186728 2159.949479 1.232478e+05 0.087312 43.477439 412.146751 30.510652 8.699751 5.499055 19018.946420 13.848086 264.274957 8.561093 10.790064 32.909646 50.789341 3.097013 73.718480 0.722471 1.280398e+08 4.727286e+04 9.535368 12.996518 1453.830857
std NaN NaN NaN NaN 3.465076e+07 1.043446e+05 1.016488e+05 4.087464e+05 606.914602 597.602496 134636.639039 1134.538414 642.891130 1039.499140 4.736573 2.947081 0.399925 2255.245237 23.556057 10438.204914 158.65274 546.810269 14.262871 11486.93403 91.369225 8.409869e+07 2.477340e+05 2195.428504 9.033843 1.138215e+06 7.308233 0.115978 3.344366e+04 NaN 1.386860e+09 6.183925e+08 5.601489e+08 3.112422e+08 3.430025e+06 2.093911e+06 83.567228 29.963802 29.531304 29.733811 3307.710186 8.514408e+05 0.188333 24.400287 1881.833423 9.083308 6.092702 4.134639 20012.002263 20.091626 120.931019 4.941349 10.779392 13.574672 31.957428 2.548380 7.397441 0.148991 6.594467e+08 1.377826e+05 13.082029 26.634303 1830.272458
min NaN NaN NaN NaN 1.000000e+00 0.000000e+00 0.000000e+00 1.000000e+00 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 -0.070000 0.000000 0.000000 0.000000 0.00000 0.000000 0.000000 0.00000 0.000000 0.000000e+00 1.000000e+00 0.000000 0.000000 0.000000e+00 0.000000 0.000000 1.000000e+00 NaN 0.000000e+00 0.000000e+00 1.000000e+00 1.000000e+00 0.000000e+00 0.000000e+00 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000e+00 0.000000 0.000000 0.137000 15.100000 1.144000 0.526000 661.240000 0.100000 79.370000 0.990000 0.100000 7.700000 1.188000 0.100000 53.280000 0.394000 4.700000e+01 -3.772610e+04 -44.230000 -95.920000 -1984.281600
25% NaN NaN NaN NaN 6.265000e+03 0.000000e+00 1.286000e+00 1.180000e+02 0.000000 0.000000 1889.971500 0.000000 0.294000 47.650000 0.000000 0.000000 0.720000 25.000000 3.049000 259.000000 38.85600 30.000000 2.625000 280.00000 29.891500 3.646540e+05 2.244000e+03 43.585500 0.286000 1.486000e+03 0.203000 0.017000 7.100000e+00 NaN 1.432613e+06 8.311475e+05 7.180020e+05 2.976735e+05 3.264000e+03 4.100000e+02 33.780000 22.960000 16.270000 3.220000 198.000000 7.900000e+01 0.003000 22.770000 37.728000 22.200000 3.526000 2.085000 3823.194000 0.600000 175.695000 5.350000 1.900000 22.600000 20.859000 1.300000 69.590000 0.602000 4.490020e+05 2.185000e+01 0.420000 -1.040000 15.453504
50% NaN NaN NaN NaN 5.986600e+04 1.900000e+01 4.000000e+01 1.193000e+03 0.000000 0.286000 19249.100000 2.808000 11.391000 324.971000 0.000000 0.038000 0.950000 112.000000 7.797000 858.000000 90.40100 142.000000 6.198000 960.00000 68.190000 2.067330e+06 8.783000e+03 234.141000 0.971000 6.570000e+03 0.851000 0.055000 1.750000e+01 NaN 1.087405e+07 5.337184e+06 4.703255e+06 3.554642e+06 2.856750e+04 4.893000e+03 111.170000 58.110000 52.230000 28.680000 904.000000 1.159000e+03 0.022000 43.215000 90.672000 29.700000 6.378000 3.871000 12294.876000 2.500000 245.465000 7.200000 6.300000 33.100000 49.839000 2.500000 75.050000 0.740000 5.882259e+06 4.464499e+03 7.750000 6.750000 881.367860
75% NaN NaN NaN NaN 6.149885e+05 5.610000e+02 6.530000e+02 1.038675e+04 6.000000 7.000000 102486.778500 73.163500 108.147500 1214.197000 0.455500 0.794000 1.140000 481.000000 21.477750 3364.000000 188.27175 497.000000 14.782000 4344.00000 125.494500 1.024845e+07 3.722900e+04 894.374500 2.914000 3.220500e+04 2.584000 0.138100 5.460000e+01 NaN 7.945056e+07 3.930546e+07 3.161463e+07 2.601720e+07 2.325775e+05 3.853575e+04 181.320000 76.380000 71.920000 55.100000 2910.000000 1.235000e+04 0.093000 62.500000 222.873000 38.700000 13.928000 8.643000 27216.445000 21.400000 333.436000 10.790000 19.300000 41.300000 83.241000 4.200000 79.460000 0.829000 2.830170e+07 3.174990e+04 15.520000 18.545000 2372.123350
max NaN NaN NaN NaN 7.627904e+08 7.460100e+06 6.410233e+06 6.897012e+06 20005.000000 14578.571000 731762.140000 228872.025000 37241.781000 6457.229000 603.656000 148.641000 5.870000 28891.000000 180.675000 154497.000000 1526.84600 4838.000000 224.976000 153977.00000 708.120000 9.214000e+09 3.585563e+07 32925.826000 531.062000 1.476998e+07 147.603000 1.000000 1.023632e+06 NaN 1.336812e+10 5.572334e+09 5.127133e+09 2.759853e+09 4.967301e+07 4.369274e+07 406.430000 129.070000 126.890000 150.470000 117113.000000 2.107109e+07 11.711000 100.000000 20546.766000 48.200000 27.049000 18.493000 116935.600000 77.600000 724.417000 30.530000 44.000000 78.100000 100.000000 13.800000 86.750000 0.957000 7.975105e+09 1.282260e+06 76.550000 377.040000 10329.523000

πŸ“Œ Step 5: Handling Missing Values

# Check for missing values
df.isnull().sum().sort_values(ascending=False) # this will show the number of null values in each column in descending order
0
weekly_icu_admissions 293411
weekly_icu_admissions_per_million 293411
excess_mortality_cumulative_absolute 292217
excess_mortality_cumulative_per_million 292217
excess_mortality_cumulative 292217
excess_mortality 292217
weekly_hosp_admissions_per_million 281225
weekly_hosp_admissions 281225
icu_patients_per_million 267748
icu_patients 267748
hosp_patients_per_million 267374
hosp_patients 267374
total_boosters 260188
total_boosters_per_hundred 260188
new_vaccinations 241970
people_fully_vaccinated 234363
people_fully_vaccinated_per_hundred 234363
people_vaccinated 232101
people_vaccinated_per_hundred 232101
total_vaccinations_per_hundred 228951
total_vaccinations 228951
new_tests 227109
new_tests_per_thousand 227109
total_tests 223125
total_tests_per_thousand 223125
tests_per_case 208164
positive_rate 206585
new_tests_smoothed_per_thousand 198547
new_tests_smoothed 198547
tests_units 195724
handwashing_facilities 187695
extreme_poverty 151812
new_vaccinations_smoothed_per_million 138976
new_vaccinations_smoothed 138976
new_people_vaccinated_smoothed_per_hundred 138925
new_people_vaccinated_smoothed 138925
male_smokers 129089
female_smokers 126697
reproduction_rate 117695
stringency_index 109318
hospital_beds_per_thousand 95601
human_development_index 75300
aged_65_older 72121
gdp_per_capita 68533
cardiovasc_death_rate 68106
aged_70_older 66153
median_age 63761
total_deaths 56298
total_deaths_per_million 56298
diabetes_prevalence 56164
population_density 45809
total_cases_per_million 35741
total_cases 35741
life_expectancy 24293
continent 14352
new_cases_smoothed 9712
new_cases_smoothed_per_million 9712
new_deaths_smoothed_per_million 9603
new_deaths_smoothed 9603
new_cases 8448
new_cases_per_million 8448
new_deaths 8373
new_deaths_per_million 8373
population 0
date 0
location 0
iso_code 0

(df.isnull().sum() / len(df) * 100).sort_values(ascending=False) # this will show the percentage of null values in each column
0
weekly_icu_admissions 96.991524
weekly_icu_admissions_per_million 96.991524
excess_mortality_cumulative_absolute 96.596829
excess_mortality_cumulative_per_million 96.596829
excess_mortality_cumulative 96.596829
excess_mortality 96.596829
weekly_hosp_admissions_per_million 92.963254
weekly_hosp_admissions 92.963254
icu_patients_per_million 88.508224
icu_patients 88.508224
hosp_patients_per_million 88.384593
hosp_patients 88.384593
total_boosters 86.009150
total_boosters_per_hundred 86.009150
new_vaccinations 79.986910
people_fully_vaccinated 77.472299
people_fully_vaccinated_per_hundred 77.472299
people_vaccinated 76.724560
people_vaccinated_per_hundred 76.724560
total_vaccinations_per_hundred 75.683279
total_vaccinations 75.683279
new_tests 75.074377
new_tests_per_thousand 75.074377
total_tests 73.757405
total_tests_per_thousand 73.757405
tests_per_case 68.811816
positive_rate 68.289853
new_tests_smoothed_per_thousand 65.632768
new_tests_smoothed 65.632768
tests_units 64.699582
handwashing_facilities 62.045473
extreme_poverty 50.183794
new_vaccinations_smoothed_per_million 45.940657
new_vaccinations_smoothed 45.940657
new_people_vaccinated_smoothed_per_hundred 45.923798
new_people_vaccinated_smoothed 45.923798
male_smokers 42.672357
female_smokers 41.881644
reproduction_rate 38.905895
stringency_index 36.136748
hospital_beds_per_thousand 31.602383
human_development_index 24.891575
aged_65_older 23.840707
gdp_per_capita 22.654638
cardiovasc_death_rate 22.513487
aged_70_older 21.867893
median_age 21.077180
total_deaths 18.610171
total_deaths_per_million 18.610171
diabetes_prevalence 18.565875
population_density 15.142870
total_cases_per_million 11.814738
total_cases 11.814738
life_expectancy 8.030425
continent 4.744275
new_cases_smoothed 3.210451
new_cases_smoothed_per_million 3.210451
new_deaths_smoothed_per_million 3.174420
new_deaths_smoothed 3.174420
new_cases 2.792616
new_cases_per_million 2.792616
new_deaths 2.767824
new_deaths_per_million 2.767824
population 0.000000
date 0.000000
location 0.000000
iso_code 0.000000

# Forward fill and Backward fill for time-series or categorical data
df.fillna(method='ffill', inplace=True)
df.fillna(method='bfill', inplace=True)

# Separate numerical and categorical columns
num_cols = df.select_dtypes(include=['number']).columns
cat_cols = df.select_dtypes(include=['object']).columns

# Fill numerical columns with median (in case ffill and bfill didn't work)
df[num_cols] = df[num_cols].apply(lambda x: x.fillna(x.median()))

# Fill categorical columns with mode
df[cat_cols] = df[cat_cols].apply(lambda x: x.fillna(x.mode()[0]))

# Check if all missing values are handled
print("Remaining missing values per column:")
print(df.isnull().sum().sum())  # Should be 0
Remaining missing values per column:
0

πŸ“Œ Step 6: Identifying & Removing Duplicates

# Check for duplicate rows
duplicates = df.duplicated().sum()
print(f"Number of duplicate rows: {duplicates}")
Number of duplicate rows: 0
# Look for any columns with invalid data types or strange values
print(df.dtypes)
iso_code                                       object
continent                                      object
location                                       object
date                                           object
total_cases                                   float64
new_cases                                     float64
new_cases_smoothed                            float64
total_deaths                                  float64
new_deaths                                    float64
new_deaths_smoothed                           float64
total_cases_per_million                       float64
new_cases_per_million                         float64
new_cases_smoothed_per_million                float64
total_deaths_per_million                      float64
new_deaths_per_million                        float64
new_deaths_smoothed_per_million               float64
reproduction_rate                             float64
icu_patients                                  float64
icu_patients_per_million                      float64
hosp_patients                                 float64
hosp_patients_per_million                     float64
weekly_icu_admissions                         float64
weekly_icu_admissions_per_million             float64
weekly_hosp_admissions                        float64
weekly_hosp_admissions_per_million            float64
total_tests                                   float64
new_tests                                     float64
total_tests_per_thousand                      float64
new_tests_per_thousand                        float64
new_tests_smoothed                            float64
new_tests_smoothed_per_thousand               float64
positive_rate                                 float64
tests_per_case                                float64
tests_units                                    object
total_vaccinations                            float64
people_vaccinated                             float64
people_fully_vaccinated                       float64
total_boosters                                float64
new_vaccinations                              float64
new_vaccinations_smoothed                     float64
total_vaccinations_per_hundred                float64
people_vaccinated_per_hundred                 float64
people_fully_vaccinated_per_hundred           float64
total_boosters_per_hundred                    float64
new_vaccinations_smoothed_per_million         float64
new_people_vaccinated_smoothed                float64
new_people_vaccinated_smoothed_per_hundred    float64
stringency_index                              float64
population_density                            float64
median_age                                    float64
aged_65_older                                 float64
aged_70_older                                 float64
gdp_per_capita                                float64
extreme_poverty                               float64
cardiovasc_death_rate                         float64
diabetes_prevalence                           float64
female_smokers                                float64
male_smokers                                  float64
handwashing_facilities                        float64
hospital_beds_per_thousand                    float64
life_expectancy                               float64
human_development_index                       float64
population                                    float64
excess_mortality_cumulative_absolute          float64
excess_mortality_cumulative                   float64
excess_mortality                              float64
excess_mortality_cumulative_per_million       float64
dtype: object

πŸ“Œ Step 7: Checking for Inconsistent Categorical Values

# Identify categorical columns
cat_cols = df.select_dtypes(include=['object']).columns

# Check for inconsistent categorical values
for col in cat_cols:
    print(f"Unique values in '{col}':")
    print(df[col].value_counts(dropna=False))  # Includes NaN counts
    print("-" * 50)  # Separator for readability
Unique values in 'iso_code':
iso_code
ARG         1198
MEX         1198
AFG         1196
PLW         1196
NIC         1196
NER         1196
NGA         1196
NIU         1196
OWID_NAM    1196
PRK         1196
MKD         1196
MNP         1196
NOR         1196
OWID_OCE    1196
OMN         1196
PAK         1196
PSE         1196
NCL         1196
PAN         1196
PNG         1196
PRY         1196
PER         1196
PHL         1196
PCN         1196
POL         1196
PRT         1196
PRI         1196
QAT         1196
REU         1196
ROU         1196
RUS         1196
NZL         1196
NLD         1196
BLM         1196
MRT         1196
LTU         1196
OWID_AFR    1196
OWID_LMC    1196
LUX         1196
MDG         1196
MWI         1196
MYS         1196
MDV         1196
MLI         1196
MLT         1196
MHL         1196
MTQ         1196
MUS         1196
NPL         1196
MYT         1196
FSM         1196
MDA         1196
MCO         1196
MNG         1196
MNE         1196
MSR         1196
MAR         1196
MOZ         1196
MMR         1196
NAM         1196
NRU         1196
RWA         1196
SHN         1196
LBY         1196
TZA         1196
TLS         1196
TGO         1196
TKL         1196
TON         1196
TTO         1196
TUN         1196
TUR         1196
TKM         1196
TCA         1196
TUV         1196
UGA         1196
UKR         1196
ARE         1196
GBR         1196
USA         1196
VIR         1196
OWID_UMC    1196
URY         1196
UZB         1196
VUT         1196
VAT         1196
VEN         1196
VNM         1196
WLF         1196
OWID_WRL    1196
YEM         1196
ZMB         1196
THA         1196
TJK         1196
KNA         1196
SYR         1196
LCA         1196
MAF         1196
SPM         1196
VCT         1196
WSM         1196
SMR         1196
STP         1196
SAU         1196
SEN         1196
SRB         1196
SYC         1196
SLE         1196
SGP         1196
SXM         1196
SVK         1196
SVN         1196
SLB         1196
SOM         1196
ZAF         1196
OWID_SAM    1196
KOR         1196
SSD         1196
ESP         1196
LKA         1196
SDN         1196
SWE         1196
CHE         1196
LIE         1196
OWID_LIC    1196
LBR         1196
COG         1196
BFA         1196
BDI         1196
KHM         1196
CMR         1196
CAN         1196
CPV         1196
CYM         1196
CAF         1196
TCD         1196
CHL         1196
CHN         1196
COL         1196
COM         1196
COK         1196
SLV         1196
CRI         1196
CIV         1196
HRV         1196
CUB         1196
CUW         1196
CYP         1196
CZE         1196
COD         1196
DNK         1196
DJI         1196
DMA         1196
DOM         1196
LSO         1196
BGR         1196
BRN         1196
VGB         1196
BRA         1196
ALB         1196
DZA         1196
ASM         1196
AND         1196
AGO         1196
AIA         1196
ATG         1196
ARM         1196
ABW         1196
OWID_ASI    1196
AUS         1196
AUT         1196
AZE         1196
BHS         1196
BHR         1196
BGD         1196
BRB         1196
BLR         1196
BEL         1196
BLZ         1196
BEN         1196
BMU         1196
BTN         1196
BOL         1196
BES         1196
BIH         1196
BWA         1196
EGY         1196
ECU         1196
ZWE         1196
IDN         1196
GGY         1196
GIN         1196
GNQ         1196
GNB         1196
GUY         1196
HTI         1196
OWID_HIC    1196
HND         1196
KGZ         1196
HUN         1196
ISL         1196
IND         1196
IRN         1196
KWT         1196
IRQ         1196
IRL         1196
IMN         1196
ISR         1196
ITA         1196
JAM         1196
JPN         1196
JEY         1196
JOR         1196
KAZ         1196
KEN         1196
KIR         1196
GTM         1196
OWID_KOS    1196
GUM         1196
GLP         1196
ERI         1196
EST         1196
SWZ         1196
ETH         1196
OWID_EUR    1196
OWID_EUN    1196
FRO         1196
FLK         1196
FJI         1196
LBN         1196
FIN         1196
FRA         1196
GUF         1196
PYF         1196
GAB         1196
GMB         1196
GEO         1196
DEU         1196
GHA         1196
GIB         1196
LVA         1196
LAO         1196
GRC         1196
GRL         1196
GRD         1196
SUR         1195
TWN         1183
HKG         1165
OWID_NIR    1131
OWID_SCT    1123
OWID_ENG    1112
OWID_WLS    1100
MAC          787
OWID_CYN     691
ESH            1
Name: count, dtype: int64
--------------------------------------------------
Unique values in 'continent':
continent
Africa           72957
Europe           69050
Asia             61234
North America    52626
Oceania          29900
South America    16745
Name: count, dtype: int64
--------------------------------------------------
Unique values in 'location':
location
Argentina                           1198
Mexico                              1198
Afghanistan                         1196
Palau                               1196
Nicaragua                           1196
Niger                               1196
Nigeria                             1196
Niue                                1196
North America                       1196
North Korea                         1196
North Macedonia                     1196
Northern Mariana Islands            1196
Norway                              1196
Oceania                             1196
Oman                                1196
Pakistan                            1196
Palestine                           1196
New Caledonia                       1196
Panama                              1196
Papua New Guinea                    1196
Paraguay                            1196
Peru                                1196
Philippines                         1196
Pitcairn                            1196
Poland                              1196
Portugal                            1196
Puerto Rico                         1196
Qatar                               1196
Reunion                             1196
Romania                             1196
Russia                              1196
New Zealand                         1196
Netherlands                         1196
Saint Barthelemy                    1196
Mauritania                          1196
Lithuania                           1196
Africa                              1196
Lower middle income                 1196
Luxembourg                          1196
Madagascar                          1196
Malawi                              1196
Malaysia                            1196
Maldives                            1196
Mali                                1196
Malta                               1196
Marshall Islands                    1196
Martinique                          1196
Mauritius                           1196
Nepal                               1196
Mayotte                             1196
Micronesia (country)                1196
Moldova                             1196
Monaco                              1196
Mongolia                            1196
Montenegro                          1196
Montserrat                          1196
Morocco                             1196
Mozambique                          1196
Myanmar                             1196
Namibia                             1196
Nauru                               1196
Rwanda                              1196
Saint Helena                        1196
Libya                               1196
Tanzania                            1196
Timor                               1196
Togo                                1196
Tokelau                             1196
Tonga                               1196
Trinidad and Tobago                 1196
Tunisia                             1196
Turkey                              1196
Turkmenistan                        1196
Turks and Caicos Islands            1196
Tuvalu                              1196
Uganda                              1196
Ukraine                             1196
United Arab Emirates                1196
United Kingdom                      1196
United States                       1196
United States Virgin Islands        1196
Upper middle income                 1196
Uruguay                             1196
Uzbekistan                          1196
Vanuatu                             1196
Vatican                             1196
Venezuela                           1196
Vietnam                             1196
Wallis and Futuna                   1196
World                               1196
Yemen                               1196
Zambia                              1196
Thailand                            1196
Tajikistan                          1196
Saint Kitts and Nevis               1196
Syria                               1196
Saint Lucia                         1196
Saint Martin (French part)          1196
Saint Pierre and Miquelon           1196
Saint Vincent and the Grenadines    1196
Samoa                               1196
San Marino                          1196
Sao Tome and Principe               1196
Saudi Arabia                        1196
Senegal                             1196
Serbia                              1196
Seychelles                          1196
Sierra Leone                        1196
Singapore                           1196
Sint Maarten (Dutch part)           1196
Slovakia                            1196
Slovenia                            1196
Solomon Islands                     1196
Somalia                             1196
South Africa                        1196
South America                       1196
South Korea                         1196
South Sudan                         1196
Spain                               1196
Sri Lanka                           1196
Sudan                               1196
Sweden                              1196
Switzerland                         1196
Liechtenstein                       1196
Low income                          1196
Liberia                             1196
Congo                               1196
Burkina Faso                        1196
Burundi                             1196
Cambodia                            1196
Cameroon                            1196
Canada                              1196
Cape Verde                          1196
Cayman Islands                      1196
Central African Republic            1196
Chad                                1196
Chile                               1196
China                               1196
Colombia                            1196
Comoros                             1196
Cook Islands                        1196
El Salvador                         1196
Costa Rica                          1196
Cote d'Ivoire                       1196
Croatia                             1196
Cuba                                1196
Curacao                             1196
Cyprus                              1196
Czechia                             1196
Democratic Republic of Congo        1196
Denmark                             1196
Djibouti                            1196
Dominica                            1196
Dominican Republic                  1196
Lesotho                             1196
Bulgaria                            1196
Brunei                              1196
British Virgin Islands              1196
Brazil                              1196
Albania                             1196
Algeria                             1196
American Samoa                      1196
Andorra                             1196
Angola                              1196
Anguilla                            1196
Antigua and Barbuda                 1196
Armenia                             1196
Aruba                               1196
Asia                                1196
Australia                           1196
Austria                             1196
Azerbaijan                          1196
Bahamas                             1196
Bahrain                             1196
Bangladesh                          1196
Barbados                            1196
Belarus                             1196
Belgium                             1196
Belize                              1196
Benin                               1196
Bermuda                             1196
Bhutan                              1196
Bolivia                             1196
Bonaire Sint Eustatius and Saba     1196
Bosnia and Herzegovina              1196
Botswana                            1196
Egypt                               1196
Ecuador                             1196
Zimbabwe                            1196
Indonesia                           1196
Guernsey                            1196
Guinea                              1196
Equatorial Guinea                   1196
Guinea-Bissau                       1196
Guyana                              1196
Haiti                               1196
High income                         1196
Honduras                            1196
Kyrgyzstan                          1196
Hungary                             1196
Iceland                             1196
India                               1196
Iran                                1196
Kuwait                              1196
Iraq                                1196
Ireland                             1196
Isle of Man                         1196
Israel                              1196
Italy                               1196
Jamaica                             1196
Japan                               1196
Jersey                              1196
Jordan                              1196
Kazakhstan                          1196
Kenya                               1196
Kiribati                            1196
Guatemala                           1196
Kosovo                              1196
Guam                                1196
Guadeloupe                          1196
Eritrea                             1196
Estonia                             1196
Eswatini                            1196
Ethiopia                            1196
Europe                              1196
European Union                      1196
Faeroe Islands                      1196
Falkland Islands                    1196
Fiji                                1196
Lebanon                             1196
Finland                             1196
France                              1196
French Guiana                       1196
French Polynesia                    1196
Gabon                               1196
Gambia                              1196
Georgia                             1196
Germany                             1196
Ghana                               1196
Gibraltar                           1196
Latvia                              1196
Laos                                1196
Greece                              1196
Greenland                           1196
Grenada                             1196
Suriname                            1195
Taiwan                              1183
Hong Kong                           1165
Northern Ireland                    1131
Scotland                            1123
England                             1112
Wales                               1100
Macao                                787
Northern Cyprus                      691
Western Sahara                         1
Name: count, dtype: int64
--------------------------------------------------
Unique values in 'date':
date
2022-04-20    255
2021-08-24    254
2021-12-22    254
2021-12-09    254
2021-12-10    254
2021-12-11    254
2021-12-12    254
2021-12-13    254
2021-12-14    254
2021-12-15    254
2021-12-16    254
2021-12-17    254
2021-12-18    254
2021-12-19    254
2021-12-20    254
2021-12-21    254
2021-12-23    254
2021-12-07    254
2021-12-24    254
2021-12-25    254
2021-12-26    254
2021-12-27    254
2021-12-28    254
2021-12-29    254
2021-12-30    254
2021-12-31    254
2022-01-01    254
2022-01-02    254
2022-01-03    254
2022-01-04    254
2022-01-05    254
2022-01-06    254
2021-12-08    254
2021-12-06    254
2022-01-08    254
2021-11-19    254
2021-11-05    254
2021-11-06    254
2021-11-07    254
2021-11-08    254
2021-11-09    254
2021-11-10    254
2021-11-11    254
2021-11-12    254
2021-11-13    254
2021-11-14    254
2021-11-15    254
2021-11-16    254
2021-11-17    254
2021-11-18    254
2021-11-20    254
2021-12-05    254
2021-11-21    254
2021-11-22    254
2021-11-23    254
2021-11-24    254
2021-11-25    254
2021-11-26    254
2021-11-27    254
2021-11-28    254
2021-11-29    254
2021-11-30    254
2021-12-01    254
2021-12-02    254
2021-12-03    254
2021-12-04    254
2022-01-07    254
2022-01-09    254
2021-11-03    254
2022-02-11    254
2022-02-13    254
2022-02-14    254
2022-02-15    254
2022-02-16    254
2022-02-17    254
2022-02-18    254
2022-02-19    254
2022-02-20    254
2022-02-21    254
2022-02-22    254
2022-02-23    254
2022-02-24    254
2022-02-25    254
2022-02-26    254
2022-02-27    254
2022-02-28    254
2022-03-01    254
2022-03-02    254
2022-03-03    254
2022-03-04    254
2022-03-05    254
2022-03-06    254
2022-03-07    254
2022-03-08    254
2022-03-09    254
2022-03-10    254
2022-03-11    254
2022-03-12    254
2022-03-13    254
2022-02-12    254
2022-02-10    254
2022-01-10    254
2022-02-09    254
2022-01-11    254
2022-01-12    254
2022-01-13    254
2022-01-14    254
2022-01-15    254
2022-01-16    254
2022-01-17    254
2022-01-18    254
2022-01-19    254
2022-01-20    254
2022-01-21    254
2022-01-22    254
2022-01-23    254
2022-01-24    254
2022-01-25    254
2022-01-26    254
2022-01-27    254
2022-01-28    254
2022-01-29    254
2022-01-30    254
2022-01-31    254
2022-02-01    254
2022-02-02    254
2022-02-03    254
2022-02-04    254
2022-02-05    254
2022-02-06    254
2022-02-07    254
2022-02-08    254
2021-11-04    254
2021-11-02    254
2022-03-15    254
2021-08-08    254
2021-07-25    254
2021-07-26    254
2021-07-27    254
2021-07-28    254
2021-07-29    254
2021-07-30    254
2021-07-31    254
2021-08-01    254
2021-08-02    254
2021-08-03    254
2021-08-04    254
2021-08-05    254
2021-08-06    254
2021-08-07    254
2021-08-09    254
2021-08-27    254
2021-08-10    254
2021-08-11    254
2021-08-12    254
2021-08-13    254
2021-08-14    254
2021-08-15    254
2021-08-17    254
2021-08-18    254
2021-08-19    254
2021-08-20    254
2021-08-21    254
2021-08-22    254
2021-08-23    254
2021-08-25    254
2021-07-24    254
2021-07-23    254
2021-07-22    254
2021-07-21    254
2021-06-22    254
2021-06-23    254
2021-06-24    254
2021-06-25    254
2021-06-26    254
2021-06-27    254
2021-06-28    254
2021-06-29    254
2021-06-30    254
2021-07-01    254
2021-07-02    254
2021-07-03    254
2021-07-04    254
2021-07-05    254
2021-07-06    254
2021-07-07    254
2021-07-08    254
2021-07-09    254
2021-07-10    254
2021-07-11    254
2021-07-12    254
2021-07-13    254
2021-07-14    254
2021-07-15    254
2021-07-16    254
2021-07-17    254
2021-07-18    254
2021-07-19    254
2021-07-20    254
2021-08-26    254
2021-08-28    254
2021-11-01    254
2021-10-16    254
2021-10-02    254
2021-10-03    254
2021-10-04    254
2021-10-05    254
2021-10-06    254
2021-10-07    254
2021-10-08    254
2021-10-09    254
2021-10-10    254
2021-10-11    254
2021-10-12    254
2021-10-13    254
2021-10-14    254
2021-10-15    254
2021-10-17    254
2021-08-29    254
2021-10-18    254
2021-10-19    254
2021-10-20    254
2021-10-21    254
2021-10-22    254
2021-10-23    254
2021-10-24    254
2021-10-25    254
2021-10-26    254
2021-10-27    254
2021-10-28    254
2021-10-29    254
2021-10-30    254
2021-10-31    254
2021-10-01    254
2021-09-30    254
2021-09-29    254
2021-09-28    254
2021-08-30    254
2021-08-31    254
2021-09-01    254
2021-09-02    254
2021-09-03    254
2021-09-04    254
2021-09-05    254
2021-09-06    254
2021-09-07    254
2021-09-08    254
2021-09-09    254
2021-09-10    254
2021-09-11    254
2021-09-12    254
2021-09-13    254
2021-09-14    254
2021-09-15    254
2021-09-16    254
2021-09-17    254
2021-09-18    254
2021-09-19    254
2021-09-20    254
2021-09-21    254
2021-09-22    254
2021-09-23    254
2021-09-24    254
2021-09-25    254
2021-09-26    254
2021-09-27    254
2022-03-14    254
2022-03-16    254
2021-06-20    254
2022-09-15    254
2022-09-01    254
2022-09-02    254
2022-09-03    254
2022-09-04    254
2022-09-05    254
2022-09-06    254
2022-09-07    254
2022-09-08    254
2022-09-09    254
2022-09-10    254
2022-09-11    254
2022-09-12    254
2022-09-13    254
2022-09-14    254
2022-09-16    254
2022-07-28    254
2022-09-17    254
2022-09-18    254
2022-09-19    254
2022-09-20    254
2022-09-21    254
2022-09-22    254
2022-09-23    254
2022-09-24    254
2022-09-25    254
2022-09-26    254
2022-09-27    254
2022-09-28    254
2022-09-29    254
2022-09-30    254
2022-08-31    254
2022-08-30    254
2022-08-29    254
2022-08-28    254
2022-07-30    254
2022-07-31    254
2022-08-01    254
2022-08-02    254
2022-08-03    254
2022-08-04    254
2022-08-05    254
2022-08-06    254
2022-08-07    254
2022-08-08    254
2022-08-09    254
2022-08-10    254
2022-08-11    254
2022-08-12    254
2022-08-13    254
2022-08-14    254
2022-08-15    254
2022-08-16    254
2022-08-17    254
2022-08-18    254
2022-08-19    254
2022-08-20    254
2022-08-21    254
2022-08-22    254
2022-08-23    254
2022-08-24    254
2022-08-25    254
2022-08-26    254
2022-08-27    254
2022-10-01    254
2022-10-02    254
2022-10-03    254
2022-11-05    254
2022-11-07    254
2022-11-08    254
2022-11-09    254
2022-11-10    254
2022-11-11    254
2022-11-12    254
2022-11-13    254
2022-11-14    254
2022-11-15    254
2022-11-16    254
2022-11-17    254
2022-11-18    254
2022-11-19    254
2022-11-20    254
2022-11-21    254
2022-11-22    254
2022-11-23    254
2022-11-24    254
2022-11-25    254
2022-11-26    254
2022-11-27    254
2022-11-28    254
2022-11-29    254
2022-11-30    254
2022-12-01    254
2022-12-02    254
2022-12-03    254
2022-12-04    254
2022-12-06    254
2022-11-06    254
2022-11-04    254
2022-10-04    254
2022-11-03    254
2022-10-05    254
2022-10-06    254
2022-10-07    254
2022-10-08    254
2022-10-09    254
2022-10-10    254
2022-10-11    254
2022-10-12    254
2022-10-13    254
2022-10-14    254
2022-10-15    254
2022-10-16    254
2022-10-17    254
2022-10-18    254
2022-10-19    254
2022-10-20    254
2022-10-21    254
2022-10-22    254
2022-10-23    254
2022-10-24    254
2022-10-25    254
2022-10-26    254
2022-10-27    254
2022-10-28    254
2022-10-29    254
2022-10-30    254
2022-10-31    254
2022-11-01    254
2022-11-02    254
2022-07-29    254
2022-07-27    254
2022-03-17    254
2022-05-05    254
2022-04-21    254
2022-04-22    254
2022-04-23    254
2022-04-24    254
2022-04-25    254
2022-04-26    254
2022-04-27    254
2022-04-28    254
2022-04-29    254
2022-04-30    254
2022-05-01    254
2022-05-02    254
2022-05-03    254
2022-05-04    254
2022-05-06    254
2022-07-26    254
2022-05-07    254
2022-05-08    254
2022-05-09    254
2022-05-10    254
2022-05-11    254
2022-05-12    254
2022-05-13    254
2022-05-14    254
2022-05-15    254
2022-05-16    254
2022-05-17    254
2022-05-18    254
2022-05-19    254
2022-05-20    254
2022-04-19    254
2022-04-18    254
2022-04-17    254
2022-04-16    254
2022-03-18    254
2022-03-19    254
2022-03-20    254
2022-03-21    254
2022-03-22    254
2022-03-23    254
2022-03-24    254
2022-03-25    254
2022-03-26    254
2022-03-27    254
2022-03-28    254
2022-03-29    254
2022-03-30    254
2022-03-31    254
2022-04-01    254
2022-04-02    254
2022-04-03    254
2022-04-04    254
2022-04-05    254
2022-04-06    254
2022-04-07    254
2022-04-08    254
2022-04-09    254
2022-04-10    254
2022-04-11    254
2022-04-12    254
2022-04-13    254
2022-04-14    254
2022-04-15    254
2022-05-21    254
2022-05-22    254
2022-05-23    254
2022-06-25    254
2022-06-27    254
2022-06-28    254
2022-06-29    254
2022-06-30    254
2022-07-01    254
2022-07-02    254
2022-07-03    254
2022-07-04    254
2022-07-05    254
2022-07-06    254
2022-07-07    254
2022-07-08    254
2022-07-09    254
2022-07-10    254
2022-07-11    254
2022-07-12    254
2022-07-13    254
2022-07-14    254
2022-07-15    254
2022-07-16    254
2022-07-17    254
2022-07-18    254
2022-07-19    254
2022-07-20    254
2022-07-21    254
2022-07-22    254
2022-07-23    254
2022-07-24    254
2022-07-25    254
2022-06-26    254
2022-06-24    254
2022-05-24    254
2022-06-23    254
2022-05-25    254
2022-05-26    254
2022-05-27    254
2022-05-28    254
2022-05-29    254
2022-05-30    254
2022-05-31    254
2022-06-01    254
2022-06-02    254
2022-06-03    254
2022-06-04    254
2022-06-05    254
2022-06-06    254
2022-06-07    254
2022-06-08    254
2022-06-09    254
2022-06-10    254
2022-06-11    254
2022-06-12    254
2022-06-13    254
2022-06-14    254
2022-06-15    254
2022-06-16    254
2022-06-17    254
2022-06-18    254
2022-06-19    254
2022-06-20    254
2022-06-21    254
2022-06-22    254
2021-06-21    254
2021-08-16    254
2021-06-19    254
2021-03-28    254
2021-03-14    254
2021-03-15    254
2021-03-16    254
2021-03-17    254
2021-03-18    254
2021-03-19    254
2021-03-20    254
2021-03-21    254
2021-03-22    254
2021-03-23    254
2021-03-24    254
2021-03-25    254
2021-03-26    254
2021-03-27    254
2021-03-29    254
2021-03-12    254
2021-03-30    254
2021-03-31    254
2021-04-01    254
2021-04-02    254
2021-04-03    254
2021-04-04    254
2021-04-05    254
2021-04-06    254
2021-04-07    254
2021-04-08    254
2021-04-09    254
2021-04-10    254
2021-04-11    254
2021-04-12    254
2021-03-13    254
2021-03-11    254
2021-04-14    254
2021-02-22    254
2021-02-08    254
2021-02-09    254
2021-02-10    254
2021-02-11    254
2021-02-12    254
2021-02-13    254
2021-02-14    254
2021-02-15    254
2021-02-16    254
2021-06-18    254
2021-02-18    254
2021-02-19    254
2021-02-20    254
2021-02-21    254
2021-02-23    254
2021-03-10    254
2021-02-24    254
2021-02-25    254
2021-02-26    254
2021-02-27    254
2021-02-28    254
2021-03-01    254
2021-03-02    254
2021-03-03    254
2021-03-04    254
2021-03-05    254
2021-03-06    254
2021-03-07    254
2021-03-08    254
2021-03-09    254
2021-04-13    254
2021-02-17    254
2021-04-15    254
2021-06-02    254
2021-05-19    254
2021-05-20    254
2021-05-21    254
2021-05-22    254
2021-05-23    254
2021-05-24    254
2021-05-25    254
2021-05-26    254
2021-05-27    254
2021-05-28    254
2021-05-29    254
2021-05-30    254
2021-05-31    254
2021-06-01    254
2021-06-03    254
2021-05-17    254
2021-06-04    254
2021-06-05    254
2021-06-06    254
2021-06-07    254
2021-06-08    254
2021-06-09    254
2021-06-10    254
2021-06-11    254
2021-06-13    254
2021-06-14    254
2021-06-15    254
2021-06-16    254
2021-06-17    254
2021-04-16    254
2021-05-18    254
2021-06-12    254
2021-05-16    254
2021-04-30    254
2021-05-15    254
2021-04-17    254
2021-04-18    254
2021-04-20    254
2021-04-21    254
2021-04-22    254
2021-04-23    254
2021-04-24    254
2021-04-25    254
2021-04-26    254
2021-04-27    254
2021-04-28    254
2021-04-29    254
2021-04-19    254
2021-05-01    254
2021-05-08    254
2021-05-13    254
2021-05-12    254
2021-05-14    254
2021-05-11    254
2021-05-10    254
2021-05-09    254
2021-05-07    254
2021-05-06    254
2021-05-05    254
2021-05-04    254
2021-05-03    254
2021-05-02    254
2023-01-14    253
2023-01-10    253
2023-01-13    253
2023-01-12    253
2023-01-11    253
2023-01-05    253
2023-01-09    253
2023-01-08    253
2023-01-07    253
2023-01-06    253
2023-01-04    253
2023-01-03    253
2023-01-15    253
2023-01-21    253
2023-01-16    253
2023-01-17    253
2023-01-18    253
2023-01-19    253
2023-01-20    253
2023-01-22    253
2023-01-23    253
2023-01-24    253
2023-01-25    253
2023-01-26    253
2023-01-01    253
2023-01-27    253
2023-01-28    253
2023-01-02    253
2022-12-18    253
2022-12-31    253
2023-04-03    253
2022-12-05    253
2023-01-30    253
2022-12-07    253
2022-12-08    253
2022-12-09    253
2022-12-10    253
2022-12-11    253
2022-12-12    253
2022-12-13    253
2022-12-14    253
2022-12-15    253
2022-12-16    253
2022-12-17    253
2022-12-19    253
2022-12-20    253
2022-12-21    253
2022-12-22    253
2022-12-23    253
2022-12-24    253
2022-12-25    253
2022-12-26    253
2022-12-27    253
2022-12-28    253
2022-12-29    253
2022-12-30    253
2023-01-29    253
2023-03-03    253
2023-01-31    253
2023-03-16    253
2023-03-04    253
2023-03-05    253
2023-03-06    253
2023-03-07    253
2023-03-08    253
2023-03-09    253
2023-03-10    253
2023-03-11    253
2023-03-12    253
2023-03-13    253
2023-03-14    253
2023-03-15    253
2023-03-17    253
2023-02-01    253
2023-03-18    253
2023-03-19    253
2023-03-20    253
2023-03-21    253
2023-03-22    253
2023-03-23    253
2023-03-24    253
2023-03-25    253
2023-03-26    253
2023-03-27    253
2023-03-29    253
2023-03-28    253
2023-03-02    253
2023-03-01    253
2023-02-28    253
2023-02-27    253
2023-02-02    253
2023-02-03    253
2023-02-04    253
2023-02-05    253
2023-02-06    253
2023-02-07    253
2023-02-08    253
2023-02-09    253
2023-02-10    253
2023-02-11    253
2023-02-12    253
2023-02-13    253
2023-02-14    253
2023-02-15    253
2023-02-16    253
2023-02-17    253
2023-02-18    253
2023-02-19    253
2023-02-20    253
2023-02-21    253
2023-02-22    253
2023-02-23    253
2023-02-24    253
2023-02-25    253
2023-02-26    253
2023-04-02    253
2023-03-30    253
2021-01-21    253
2021-01-28    253
2021-01-14    253
2021-01-15    253
2021-01-16    253
2021-01-17    253
2021-01-18    253
2021-01-19    253
2021-01-20    253
2021-01-22    253
2021-01-23    253
2021-01-24    253
2021-01-26    253
2021-01-27    253
2021-01-25    253
2021-01-29    253
2021-02-04    253
2023-03-31    253
2023-04-01    253
2021-02-07    253
2021-02-06    253
2021-01-30    253
2021-02-05    253
2021-02-03    253
2021-02-02    253
2021-02-01    253
2021-01-31    253
2020-10-22    252
2020-10-18    252
2020-10-21    252
2020-10-20    252
2020-10-19    252
2023-04-05    252
2023-04-04    252
2020-10-15    252
2020-10-17    252
2020-10-14    252
2020-04-01    252
2020-04-02    252
2020-04-03    252
2020-04-04    252
2020-10-24    252
2020-10-23    252
2020-11-07    252
2020-10-25    252
2020-10-26    252
2020-10-27    252
2020-10-28    252
2020-10-29    252
2020-10-30    252
2020-10-31    252
2020-11-01    252
2020-11-02    252
2020-11-03    252
2020-11-04    252
2020-11-05    252
2020-11-06    252
2020-04-06    252
2020-11-08    252
2020-04-05    252
2020-04-20    252
2020-04-07    252
2020-04-08    252
2020-05-07    252
2020-05-06    252
2020-05-05    252
2020-05-04    252
2020-05-03    252
2020-05-02    252
2020-05-01    252
2020-04-30    252
2020-04-29    252
2020-04-28    252
2020-04-27    252
2020-04-26    252
2020-04-25    252
2020-04-24    252
2020-04-23    252
2020-04-22    252
2020-04-21    252
2020-11-10    252
2020-04-19    252
2020-04-18    252
2020-04-17    252
2020-04-16    252
2020-04-15    252
2020-04-14    252
2020-04-13    252
2020-04-12    252
2020-04-11    252
2020-04-10    252
2020-04-09    252
2020-11-09    252
2020-11-24    252
2020-11-11    252
2020-12-14    252
2020-12-16    252
2020-12-17    252
2020-12-18    252
2020-12-19    252
2020-12-20    252
2020-12-21    252
2020-12-22    252
2020-12-23    252
2020-12-24    252
2020-12-25    252
2020-12-26    252
2020-12-27    252
2020-12-28    252
2020-12-29    252
2020-12-30    252
2020-12-31    252
2021-01-01    252
2021-01-02    252
2021-01-03    252
2021-01-04    252
2021-01-05    252
2021-01-06    252
2021-01-07    252
2021-01-08    252
2021-01-09    252
2021-01-10    252
2021-01-11    252
2021-01-12    252
2021-01-13    252
2020-12-15    252
2020-12-13    252
2020-11-12    252
2020-12-12    252
2020-11-13    252
2020-11-14    252
2020-11-15    252
2020-11-16    252
2020-11-17    252
2020-11-18    252
2020-11-19    252
2020-11-20    252
2020-11-21    252
2020-11-22    252
2020-11-23    252
2020-05-09    252
2020-11-25    252
2020-11-26    252
2020-11-27    252
2020-11-28    252
2020-11-29    252
2020-11-30    252
2020-12-01    252
2020-12-02    252
2020-12-03    252
2020-12-04    252
2020-12-05    252
2020-12-06    252
2020-12-07    252
2020-12-08    252
2020-12-09    252
2020-12-10    252
2020-12-11    252
2020-05-08    252
2020-08-01    252
2020-05-10    252
2020-08-05    252
2020-08-12    252
2020-08-11    252
2020-08-10    252
2020-08-09    252
2020-08-08    252
2020-08-07    252
2020-08-06    252
2020-08-04    252
2020-08-14    252
2020-08-03    252
2020-08-02    252
2020-10-16    252
2020-07-31    252
2020-07-30    252
2020-06-18    252
2020-06-19    252
2020-08-13    252
2020-08-15    252
2020-07-23    252
2020-08-25    252
2020-09-01    252
2020-08-31    252
2020-08-30    252
2020-08-29    252
2020-08-28    252
2020-08-27    252
2020-08-26    252
2020-08-24    252
2020-08-16    252
2020-08-23    252
2020-08-22    252
2020-08-21    252
2020-08-20    252
2020-08-19    252
2020-08-18    252
2020-08-17    252
2020-06-20    252
2020-06-21    252
2020-07-29    252
2020-07-13    252
2020-07-06    252
2020-07-07    252
2020-07-08    252
2020-07-09    252
2020-07-10    252
2020-07-11    252
2020-07-12    252
2020-07-14    252
2020-07-28    252
2020-07-15    252
2020-07-16    252
2020-07-17    252
2020-07-18    252
2020-07-19    252
2020-07-20    252
2020-07-21    252
2020-07-05    252
2020-07-04    252
2020-07-03    252
2020-07-02    252
2020-05-11    252
2020-07-27    252
2020-07-26    252
2020-07-25    252
2020-06-22    252
2020-07-24    252
2020-06-23    252
2020-06-24    252
2020-06-25    252
2020-06-26    252
2020-06-27    252
2020-06-28    252
2020-06-29    252
2020-06-30    252
2020-07-01    252
2020-09-02    252
2020-09-03    252
2020-09-04    252
2020-06-07    252
2020-05-31    252
2020-06-01    252
2020-06-02    252
2020-06-03    252
2020-06-04    252
2020-06-05    252
2020-06-06    252
2020-06-08    252
2020-10-12    252
2020-06-09    252
2020-06-10    252
2020-06-11    252
2020-06-12    252
2020-06-13    252
2020-06-14    252
2020-06-15    252
2020-05-30    252
2020-05-29    252
2020-05-28    252
2020-05-27    252
2020-05-12    252
2020-05-13    252
2020-05-14    252
2020-10-13    252
2020-05-16    252
2020-05-17    252
2020-05-18    252
2020-05-19    252
2020-05-20    252
2020-05-21    252
2020-05-22    252
2020-05-23    252
2020-05-24    252
2020-05-25    252
2020-05-26    252
2020-06-16    252
2020-10-11    252
2020-09-05    252
2020-09-14    252
2020-09-21    252
2020-09-20    252
2020-09-19    252
2020-09-18    252
2020-09-17    252
2020-09-16    252
2020-09-15    252
2020-09-13    252
2020-06-17    252
2020-09-12    252
2020-09-11    252
2020-09-10    252
2020-09-09    252
2020-09-08    252
2020-09-07    252
2020-09-06    252
2020-09-22    252
2020-09-23    252
2020-09-24    252
2020-09-25    252
2020-10-10    252
2020-10-09    252
2020-10-08    252
2020-10-07    252
2020-10-06    252
2020-10-05    252
2020-10-04    252
2020-10-03    252
2020-10-02    252
2020-10-01    252
2020-09-30    252
2020-09-29    252
2020-09-28    252
2020-09-27    252
2020-09-26    252
2020-07-22    252
2020-03-29    251
2020-03-24    251
2020-05-15    251
2020-03-31    251
2020-03-30    251
2020-03-28    251
2020-03-27    251
2020-03-26    251
2020-03-25    251
2020-03-23    251
2020-03-22    251
2020-03-21    251
2020-03-20    251
2020-03-08    250
2020-03-10    250
2020-03-09    250
2020-03-13    250
2020-03-07    250
2020-03-12    250
2020-03-11    250
2020-03-16    250
2020-03-14    250
2020-03-15    250
2020-03-17    250
2020-03-18    250
2020-03-19    250
2020-03-06    249
2020-03-05    249
2020-03-04    249
2020-03-03    249
2020-03-02    249
2020-03-01    249
2020-02-23    248
2020-02-26    248
2020-02-24    248
2020-02-25    248
2020-02-20    248
2020-02-27    248
2020-02-28    248
2020-02-29    248
2020-02-21    248
2020-02-22    248
2023-04-08    248
2020-01-31    248
2020-02-08    248
2023-04-06    248
2023-04-07    248
2023-04-09    248
2020-02-19    248
2020-02-01    248
2020-02-02    248
2020-02-03    248
2020-02-04    248
2020-02-06    248
2020-02-07    248
2020-02-05    248
2020-02-09    248
2020-02-11    248
2020-02-12    248
2020-02-13    248
2020-02-14    248
2020-02-15    248
2020-02-10    248
2020-02-16    248
2020-02-17    248
2020-02-18    248
2020-01-17    247
2020-01-19    247
2020-01-18    247
2023-04-10    247
2020-01-16    247
2020-01-21    247
2020-01-20    247
2023-04-11    247
2020-01-22    247
2020-01-23    247
2020-01-24    247
2020-01-25    247
2020-01-26    247
2020-01-27    247
2020-01-28    247
2020-01-29    247
2020-01-30    247
2023-04-12    247
2020-01-03    246
2020-01-04    246
2020-01-15    246
2020-01-14    246
2020-01-13    246
2020-01-12    246
2020-01-11    246
2020-01-10    246
2020-01-09    246
2020-01-08    246
2020-01-07    246
2020-01-06    246
2020-01-05    246
2020-01-01      2
2020-01-02      2
Name: count, dtype: int64
--------------------------------------------------
Unique values in 'tests_units':
tests_units
tests performed    223167
people tested       52600
samples tested      25520
units unclear        1225
Name: count, dtype: int64
--------------------------------------------------

Explanation:

βœ… Identifies categorical columns
βœ… Displays all unique values along with their counts
βœ… Includes NaN values for completeness

# Identify unique values in 'iso_code'
invalid_iso_codes = ['OWID_CYN', 'ESH']  # These are outliers

# Filter out invalid iso_codes
df = df[~df['iso_code'].isin(invalid_iso_codes)]

# Identify inconsistent location values (e.g., "Lower middle income", "Africa")
valid_locations = df['location'].value_counts().index.tolist()

# Define a function to check if a location is valid
def is_valid_location(loc):
    invalid_terms = ["income", "World", "continent", "region"]  # Broad terms that indicate issues
    return not any(term.lower() in loc.lower() for term in invalid_terms)

# Apply filtering
df = df[df['location'].apply(is_valid_location)]

# Print cleaned categorical data summary
for col in cat_cols:
    print(f"Column: {col}")
    print(df[col].value_counts(dropna=False))
    print("-" * 50)
Column: iso_code
iso_code
ARG         1198
MEX         1198
AFG         1196
PSE         1196
NER         1196
NGA         1196
NIU         1196
OWID_NAM    1196
PRK         1196
MKD         1196
MNP         1196
NOR         1196
OWID_OCE    1196
OMN         1196
PAK         1196
PLW         1196
PAN         1196
NZL         1196
PNG         1196
PRY         1196
PER         1196
PHL         1196
PCN         1196
POL         1196
PRT         1196
PRI         1196
QAT         1196
REU         1196
ROU         1196
RUS         1196
NIC         1196
NLD         1196
NCL         1196
MUS         1196
OWID_AFR    1196
LTU         1196
LUX         1196
MDG         1196
MWI         1196
MYS         1196
MDV         1196
MLI         1196
MLT         1196
MHL         1196
MTQ         1196
MRT         1196
MYT         1196
BLM         1196
FSM         1196
MDA         1196
MCO         1196
MNG         1196
MNE         1196
MSR         1196
MAR         1196
MOZ         1196
MMR         1196
NAM         1196
NRU         1196
NPL         1196
RWA         1196
SHN         1196
LBR         1196
UKR         1196
THA         1196
TLS         1196
TGO         1196
TKL         1196
TON         1196
TTO         1196
TUN         1196
TUR         1196
TKM         1196
TCA         1196
TUV         1196
UGA         1196
ARE         1196
TJK         1196
GBR         1196
USA         1196
VIR         1196
URY         1196
UZB         1196
VUT         1196
VAT         1196
VEN         1196
VNM         1196
WLF         1196
YEM         1196
ZMB         1196
TZA         1196
SYR         1196
KNA         1196
SGP         1196
LCA         1196
MAF         1196
SPM         1196
VCT         1196
WSM         1196
SMR         1196
STP         1196
SAU         1196
SEN         1196
SRB         1196
SYC         1196
SLE         1196
SXM         1196
CHE         1196
SVK         1196
SVN         1196
SLB         1196
SOM         1196
ZAF         1196
OWID_SAM    1196
KOR         1196
SSD         1196
ESP         1196
LKA         1196
SDN         1196
SWE         1196
LBY         1196
LIE         1196
LSO         1196
BRN         1196
BFA         1196
BDI         1196
KHM         1196
CMR         1196
CAN         1196
CPV         1196
CYM         1196
CAF         1196
TCD         1196
CHL         1196
CHN         1196
COL         1196
COM         1196
COG         1196
COK         1196
CRI         1196
CIV         1196
HRV         1196
CUB         1196
CUW         1196
CYP         1196
CZE         1196
COD         1196
DNK         1196
DJI         1196
DMA         1196
LBN         1196
BGR         1196
VGB         1196
EGY         1196
BRA         1196
ALB         1196
DZA         1196
ASM         1196
AND         1196
AGO         1196
AIA         1196
ATG         1196
ARM         1196
ABW         1196
OWID_ASI    1196
AUS         1196
AUT         1196
AZE         1196
BHS         1196
BHR         1196
BGD         1196
BRB         1196
BLR         1196
BEL         1196
BLZ         1196
BEN         1196
BMU         1196
BTN         1196
BOL         1196
BES         1196
BIH         1196
BWA         1196
ECU         1196
DOM         1196
SLV         1196
ISR         1196
GNB         1196
GUY         1196
HTI         1196
HND         1196
HUN         1196
ISL         1196
IND         1196
IDN         1196
IRN         1196
IRQ         1196
IRL         1196
IMN         1196
ITA         1196
GTM         1196
JAM         1196
JPN         1196
JEY         1196
JOR         1196
KAZ         1196
KEN         1196
KIR         1196
OWID_KOS    1196
KWT         1196
KGZ         1196
LAO         1196
LVA         1196
GGY         1196
GIN         1196
GUF         1196
PYF         1196
GNQ         1196
ERI         1196
EST         1196
SWZ         1196
ETH         1196
OWID_EUR    1196
OWID_EUN    1196
FRO         1196
FLK         1196
FJI         1196
FIN         1196
FRA         1196
GUM         1196
ZWE         1196
GAB         1196
GIB         1196
GLP         1196
GRD         1196
GRL         1196
GMB         1196
GRC         1196
GHA         1196
DEU         1196
GEO         1196
SUR         1195
TWN         1183
HKG         1165
OWID_NIR    1131
OWID_SCT    1123
OWID_ENG    1112
OWID_WLS    1100
MAC          787
Name: count, dtype: int64
--------------------------------------------------
Column: continent
continent
Africa           71760
Europe           66658
Asia             60543
North America    50234
Oceania          29900
South America    16745
Name: count, dtype: int64
--------------------------------------------------
Column: location
location
Argentina                           1198
Mexico                              1198
Afghanistan                         1196
Palestine                           1196
Niger                               1196
Nigeria                             1196
Niue                                1196
North America                       1196
North Korea                         1196
North Macedonia                     1196
Northern Mariana Islands            1196
Norway                              1196
Oceania                             1196
Oman                                1196
Pakistan                            1196
Palau                               1196
Panama                              1196
New Zealand                         1196
Papua New Guinea                    1196
Paraguay                            1196
Peru                                1196
Philippines                         1196
Pitcairn                            1196
Poland                              1196
Portugal                            1196
Puerto Rico                         1196
Qatar                               1196
Reunion                             1196
Romania                             1196
Russia                              1196
Nicaragua                           1196
Netherlands                         1196
New Caledonia                       1196
Mauritius                           1196
Africa                              1196
Lithuania                           1196
Luxembourg                          1196
Madagascar                          1196
Malawi                              1196
Malaysia                            1196
Maldives                            1196
Mali                                1196
Malta                               1196
Marshall Islands                    1196
Martinique                          1196
Mauritania                          1196
Mayotte                             1196
Saint Barthelemy                    1196
Micronesia (country)                1196
Moldova                             1196
Monaco                              1196
Mongolia                            1196
Montenegro                          1196
Montserrat                          1196
Morocco                             1196
Mozambique                          1196
Myanmar                             1196
Namibia                             1196
Nauru                               1196
Nepal                               1196
Rwanda                              1196
Saint Helena                        1196
Liberia                             1196
Ukraine                             1196
Thailand                            1196
Timor                               1196
Togo                                1196
Tokelau                             1196
Tonga                               1196
Trinidad and Tobago                 1196
Tunisia                             1196
Turkey                              1196
Turkmenistan                        1196
Turks and Caicos Islands            1196
Tuvalu                              1196
Uganda                              1196
United Arab Emirates                1196
Tajikistan                          1196
United Kingdom                      1196
United States                       1196
United States Virgin Islands        1196
Uruguay                             1196
Uzbekistan                          1196
Vanuatu                             1196
Vatican                             1196
Venezuela                           1196
Vietnam                             1196
Wallis and Futuna                   1196
Yemen                               1196
Zambia                              1196
Tanzania                            1196
Syria                               1196
Saint Kitts and Nevis               1196
Singapore                           1196
Saint Lucia                         1196
Saint Martin (French part)          1196
Saint Pierre and Miquelon           1196
Saint Vincent and the Grenadines    1196
Samoa                               1196
San Marino                          1196
Sao Tome and Principe               1196
Saudi Arabia                        1196
Senegal                             1196
Serbia                              1196
Seychelles                          1196
Sierra Leone                        1196
Sint Maarten (Dutch part)           1196
Switzerland                         1196
Slovakia                            1196
Slovenia                            1196
Solomon Islands                     1196
Somalia                             1196
South Africa                        1196
South America                       1196
South Korea                         1196
South Sudan                         1196
Spain                               1196
Sri Lanka                           1196
Sudan                               1196
Sweden                              1196
Libya                               1196
Liechtenstein                       1196
Lesotho                             1196
Brunei                              1196
Burkina Faso                        1196
Burundi                             1196
Cambodia                            1196
Cameroon                            1196
Canada                              1196
Cape Verde                          1196
Cayman Islands                      1196
Central African Republic            1196
Chad                                1196
Chile                               1196
China                               1196
Colombia                            1196
Comoros                             1196
Congo                               1196
Cook Islands                        1196
Costa Rica                          1196
Cote d'Ivoire                       1196
Croatia                             1196
Cuba                                1196
Curacao                             1196
Cyprus                              1196
Czechia                             1196
Democratic Republic of Congo        1196
Denmark                             1196
Djibouti                            1196
Dominica                            1196
Lebanon                             1196
Bulgaria                            1196
British Virgin Islands              1196
Egypt                               1196
Brazil                              1196
Albania                             1196
Algeria                             1196
American Samoa                      1196
Andorra                             1196
Angola                              1196
Anguilla                            1196
Antigua and Barbuda                 1196
Armenia                             1196
Aruba                               1196
Asia                                1196
Australia                           1196
Austria                             1196
Azerbaijan                          1196
Bahamas                             1196
Bahrain                             1196
Bangladesh                          1196
Barbados                            1196
Belarus                             1196
Belgium                             1196
Belize                              1196
Benin                               1196
Bermuda                             1196
Bhutan                              1196
Bolivia                             1196
Bonaire Sint Eustatius and Saba     1196
Bosnia and Herzegovina              1196
Botswana                            1196
Ecuador                             1196
Dominican Republic                  1196
El Salvador                         1196
Israel                              1196
Guinea-Bissau                       1196
Guyana                              1196
Haiti                               1196
Honduras                            1196
Hungary                             1196
Iceland                             1196
India                               1196
Indonesia                           1196
Iran                                1196
Iraq                                1196
Ireland                             1196
Isle of Man                         1196
Italy                               1196
Guatemala                           1196
Jamaica                             1196
Japan                               1196
Jersey                              1196
Jordan                              1196
Kazakhstan                          1196
Kenya                               1196
Kiribati                            1196
Kosovo                              1196
Kuwait                              1196
Kyrgyzstan                          1196
Laos                                1196
Latvia                              1196
Guernsey                            1196
Guinea                              1196
French Guiana                       1196
French Polynesia                    1196
Equatorial Guinea                   1196
Eritrea                             1196
Estonia                             1196
Eswatini                            1196
Ethiopia                            1196
Europe                              1196
European Union                      1196
Faeroe Islands                      1196
Falkland Islands                    1196
Fiji                                1196
Finland                             1196
France                              1196
Guam                                1196
Zimbabwe                            1196
Gabon                               1196
Gibraltar                           1196
Guadeloupe                          1196
Grenada                             1196
Greenland                           1196
Gambia                              1196
Greece                              1196
Ghana                               1196
Germany                             1196
Georgia                             1196
Suriname                            1195
Taiwan                              1183
Hong Kong                           1165
Northern Ireland                    1131
Scotland                            1123
England                             1112
Wales                               1100
Macao                                787
Name: count, dtype: int64
--------------------------------------------------
Column: date
date
2021-08-24    248
2022-02-18    248
2022-03-06    248
2022-03-05    248
2022-03-04    248
2022-03-03    248
2022-03-02    248
2022-03-01    248
2022-02-28    248
2022-02-27    248
2022-02-26    248
2022-02-25    248
2022-02-24    248
2022-02-23    248
2022-02-22    248
2022-02-21    248
2022-02-20    248
2022-03-07    248
2022-03-08    248
2022-03-09    248
2022-03-18    248
2022-03-24    248
2022-03-23    248
2022-03-22    248
2022-03-21    248
2022-03-20    248
2022-03-19    248
2022-03-17    248
2022-03-10    248
2022-03-16    248
2022-03-15    248
2022-03-14    248
2022-03-13    248
2022-03-12    248
2022-03-11    248
2022-02-19    248
2022-02-17    248
2022-03-26    248
2022-02-16    248
2022-01-28    248
2022-01-27    248
2022-01-26    248
2022-01-25    248
2022-01-24    248
2022-01-23    248
2022-01-22    248
2022-01-21    248
2022-01-20    248
2022-01-19    248
2022-01-18    248
2022-01-17    248
2022-01-16    248
2022-01-15    248
2022-01-14    248
2022-01-29    248
2022-01-30    248
2022-01-31    248
2022-02-09    248
2022-02-15    248
2022-02-14    248
2022-02-13    248
2022-02-12    248
2022-02-11    248
2022-02-10    248
2022-02-08    248
2022-02-01    248
2022-02-07    248
2022-02-06    248
2022-02-05    248
2022-02-04    248
2022-02-03    248
2022-02-02    248
2022-03-25    248
2022-03-27    248
2022-01-12    248
2022-05-03    248
2022-05-19    248
2022-05-18    248
2022-05-17    248
2022-05-16    248
2022-05-15    248
2022-05-14    248
2022-05-13    248
2022-05-12    248
2022-05-11    248
2022-05-10    248
2022-05-09    248
2022-05-08    248
2022-05-07    248
2022-05-06    248
2022-05-05    248
2022-05-20    248
2022-05-21    248
2022-05-22    248
2022-05-31    248
2022-06-06    248
2022-06-05    248
2022-06-04    248
2022-06-03    248
2022-06-02    248
2022-06-01    248
2022-05-30    248
2022-05-23    248
2022-05-29    248
2022-05-28    248
2022-05-27    248
2022-05-26    248
2022-05-25    248
2022-05-24    248
2022-05-04    248
2022-05-02    248
2022-03-28    248
2022-05-01    248
2022-04-12    248
2022-04-11    248
2022-04-10    248
2022-04-09    248
2022-04-08    248
2022-04-07    248
2022-04-06    248
2022-04-05    248
2022-04-04    248
2022-04-03    248
2022-04-02    248
2022-04-01    248
2022-03-31    248
2022-03-30    248
2022-03-29    248
2022-04-13    248
2022-04-14    248
2022-04-15    248
2022-04-24    248
2022-04-30    248
2022-04-29    248
2022-04-28    248
2022-04-27    248
2022-04-26    248
2022-04-25    248
2022-04-23    248
2022-04-16    248
2022-04-22    248
2022-04-21    248
2022-04-20    248
2022-04-19    248
2022-04-18    248
2022-04-17    248
2022-01-13    248
2022-01-11    248
2022-06-08    248
2021-09-23    248
2021-10-09    248
2021-10-08    248
2021-10-07    248
2021-10-06    248
2021-10-05    248
2021-10-04    248
2021-10-03    248
2021-10-02    248
2021-10-01    248
2021-09-30    248
2021-09-29    248
2021-09-28    248
2021-09-27    248
2021-09-26    248
2021-09-25    248
2021-10-10    248
2021-10-11    248
2021-10-12    248
2021-10-21    248
2021-10-27    248
2021-10-26    248
2021-10-25    248
2021-10-24    248
2021-10-23    248
2021-10-22    248
2021-10-20    248
2021-10-13    248
2021-10-19    248
2021-10-18    248
2021-10-17    248
2021-10-16    248
2021-10-15    248
2021-10-14    248
2021-09-24    248
2021-09-22    248
2021-10-29    248
2021-09-21    248
2021-09-02    248
2021-09-01    248
2021-08-31    248
2021-08-30    248
2021-08-29    248
2021-08-28    248
2021-08-27    248
2021-08-26    248
2021-08-25    248
2023-04-01    248
2021-08-23    248
2021-08-22    248
2021-08-21    248
2021-08-20    248
2021-08-19    248
2021-09-03    248
2021-09-04    248
2021-09-05    248
2021-09-14    248
2021-09-20    248
2021-09-19    248
2021-09-18    248
2021-09-17    248
2021-09-16    248
2021-09-15    248
2021-09-13    248
2021-09-06    248
2021-09-12    248
2021-09-11    248
2021-09-10    248
2021-09-09    248
2021-09-08    248
2021-09-07    248
2021-10-28    248
2021-10-30    248
2022-01-10    248
2021-12-06    248
2021-12-22    248
2021-12-21    248
2021-12-20    248
2021-12-19    248
2021-12-18    248
2021-12-17    248
2021-12-16    248
2021-12-15    248
2021-12-14    248
2021-12-13    248
2021-12-12    248
2021-12-11    248
2021-12-10    248
2021-12-09    248
2021-12-08    248
2021-12-23    248
2021-12-24    248
2021-12-25    248
2022-01-03    248
2022-01-09    248
2022-01-08    248
2022-01-07    248
2022-01-06    248
2022-01-05    248
2022-01-04    248
2022-01-02    248
2021-12-26    248
2022-01-01    248
2021-12-31    248
2021-12-30    248
2021-12-29    248
2021-12-28    248
2021-12-27    248
2021-12-07    248
2021-12-05    248
2021-10-31    248
2021-12-04    248
2021-11-15    248
2021-11-14    248
2021-11-13    248
2021-11-12    248
2021-11-11    248
2021-11-10    248
2021-11-09    248
2021-11-08    248
2021-11-07    248
2021-11-06    248
2021-11-05    248
2021-11-04    248
2021-11-03    248
2021-11-02    248
2021-11-01    248
2021-11-16    248
2021-11-17    248
2021-11-18    248
2021-11-27    248
2021-12-03    248
2021-12-02    248
2021-12-01    248
2021-11-30    248
2021-11-29    248
2021-11-28    248
2021-11-26    248
2021-11-19    248
2021-11-25    248
2021-11-24    248
2021-11-23    248
2021-11-22    248
2021-11-21    248
2021-11-20    248
2022-06-07    248
2022-06-09    248
2021-08-17    248
2022-12-11    248
2022-12-27    248
2022-12-26    248
2022-12-25    248
2022-12-24    248
2022-12-23    248
2022-12-22    248
2022-12-21    248
2022-12-20    248
2022-12-19    248
2022-12-18    248
2022-12-17    248
2022-12-16    248
2022-12-15    248
2022-12-14    248
2022-12-13    248
2022-12-28    248
2022-12-29    248
2022-12-30    248
2023-01-08    248
2023-01-14    248
2023-01-13    248
2023-01-12    248
2023-01-11    248
2023-01-10    248
2023-01-09    248
2023-01-07    248
2022-12-31    248
2023-01-06    248
2023-01-05    248
2023-01-04    248
2023-01-03    248
2023-01-02    248
2023-01-01    248
2022-12-12    248
2022-12-10    248
2023-01-16    248
2022-12-09    248
2022-11-20    248
2022-11-19    248
2022-11-18    248
2022-11-17    248
2022-11-16    248
2022-11-15    248
2022-11-14    248
2022-11-13    248
2022-11-12    248
2022-11-11    248
2022-11-10    248
2022-11-09    248
2022-11-08    248
2022-11-07    248
2022-11-06    248
2022-11-21    248
2022-11-22    248
2022-11-23    248
2022-12-02    248
2022-12-08    248
2022-12-07    248
2022-12-06    248
2022-12-05    248
2022-12-04    248
2022-12-03    248
2022-12-01    248
2022-11-24    248
2022-11-30    248
2022-11-29    248
2022-11-28    248
2022-11-27    248
2022-11-26    248
2022-11-25    248
2023-01-15    248
2023-01-17    248
2022-11-04    248
2023-02-23    248
2023-03-11    248
2023-03-10    248
2023-03-09    248
2023-03-08    248
2023-03-07    248
2023-03-06    248
2023-03-05    248
2023-03-04    248
2023-03-03    248
2023-03-02    248
2023-03-01    248
2023-02-28    248
2023-02-27    248
2023-02-26    248
2023-02-25    248
2023-03-12    248
2023-03-13    248
2023-03-14    248
2023-03-23    248
2023-03-29    248
2023-03-28    248
2023-03-27    248
2023-03-26    248
2023-03-25    248
2023-03-24    248
2023-03-22    248
2023-03-15    248
2023-03-21    248
2023-03-20    248
2023-03-19    248
2023-03-18    248
2023-03-17    248
2023-03-16    248
2023-02-24    248
2023-02-22    248
2023-01-18    248
2023-02-21    248
2023-02-02    248
2023-02-01    248
2023-01-31    248
2023-01-30    248
2023-01-29    248
2023-01-28    248
2023-01-27    248
2023-01-26    248
2023-01-25    248
2023-01-24    248
2023-01-23    248
2023-01-22    248
2023-01-21    248
2023-01-20    248
2023-01-19    248
2023-02-03    248
2023-02-04    248
2023-02-05    248
2023-02-14    248
2023-02-20    248
2023-02-19    248
2023-02-18    248
2023-02-17    248
2023-02-16    248
2023-02-15    248
2023-02-13    248
2023-02-06    248
2023-02-12    248
2023-02-11    248
2023-02-10    248
2023-02-09    248
2023-02-08    248
2023-02-07    248
2022-11-05    248
2022-11-03    248
2022-06-10    248
2022-07-16    248
2022-08-01    248
2022-07-31    248
2022-07-30    248
2022-07-29    248
2022-07-28    248
2022-07-27    248
2022-07-26    248
2022-07-25    248
2022-07-24    248
2022-07-23    248
2022-07-22    248
2022-07-21    248
2022-07-20    248
2022-07-19    248
2022-07-18    248
2022-08-02    248
2022-08-03    248
2022-08-04    248
2022-08-13    248
2022-08-19    248
2022-08-18    248
2022-08-17    248
2022-08-16    248
2022-08-15    248
2022-08-14    248
2022-08-12    248
2022-08-05    248
2022-08-11    248
2022-08-10    248
2022-08-09    248
2022-08-08    248
2022-08-07    248
2022-08-06    248
2022-07-17    248
2022-07-15    248
2022-08-21    248
2022-07-14    248
2022-06-25    248
2022-06-24    248
2022-06-23    248
2022-06-22    248
2022-06-21    248
2022-06-20    248
2022-06-19    248
2022-06-18    248
2022-06-17    248
2022-06-16    248
2022-06-15    248
2022-06-14    248
2022-06-13    248
2022-06-12    248
2022-06-11    248
2022-06-26    248
2022-06-27    248
2022-06-28    248
2022-07-07    248
2022-07-13    248
2022-07-12    248
2022-07-11    248
2022-07-10    248
2022-07-09    248
2022-07-08    248
2022-07-06    248
2022-06-29    248
2022-07-05    248
2022-07-04    248
2022-07-03    248
2022-07-02    248
2022-07-01    248
2022-06-30    248
2022-08-20    248
2022-08-22    248
2022-11-02    248
2022-09-28    248
2022-10-14    248
2022-10-13    248
2022-10-12    248
2022-10-11    248
2022-10-10    248
2022-10-09    248
2022-10-08    248
2022-10-07    248
2022-10-06    248
2022-10-05    248
2022-10-04    248
2022-10-03    248
2022-10-02    248
2022-10-01    248
2022-09-30    248
2022-10-15    248
2022-10-16    248
2022-10-17    248
2022-10-26    248
2022-11-01    248
2022-10-31    248
2022-10-30    248
2022-10-29    248
2022-10-28    248
2022-10-27    248
2022-10-25    248
2022-10-18    248
2022-10-24    248
2022-10-23    248
2022-10-22    248
2022-10-21    248
2022-10-20    248
2022-10-19    248
2022-09-29    248
2022-09-27    248
2022-08-23    248
2022-09-26    248
2022-09-07    248
2022-09-06    248
2022-09-05    248
2022-09-04    248
2022-09-03    248
2022-09-02    248
2022-09-01    248
2022-08-31    248
2022-08-30    248
2022-08-29    248
2022-08-28    248
2022-08-27    248
2022-08-26    248
2022-08-25    248
2022-08-24    248
2022-09-08    248
2022-09-09    248
2022-09-10    248
2022-09-19    248
2022-09-25    248
2022-09-24    248
2022-09-23    248
2022-09-22    248
2022-09-21    248
2022-09-20    248
2022-09-18    248
2022-09-11    248
2022-09-17    248
2022-09-16    248
2022-09-15    248
2022-09-14    248
2022-09-13    248
2022-09-12    248
2021-08-18    248
2021-08-16    248
2023-03-31    248
2021-03-24    248
2021-04-15    248
2021-04-14    248
2021-04-13    248
2021-04-12    248
2021-04-11    248
2021-04-10    248
2021-04-09    248
2021-04-08    248
2021-04-07    248
2021-04-06    248
2021-04-05    248
2021-04-04    248
2021-04-03    248
2021-04-02    248
2021-04-01    248
2021-03-31    248
2021-03-30    248
2021-03-29    248
2021-03-28    248
2021-03-27    248
2021-03-26    248
2021-04-16    248
2021-04-17    248
2021-04-18    248
2021-04-30    248
2021-05-09    248
2021-05-08    248
2021-05-07    248
2021-05-06    248
2021-05-05    248
2021-05-04    248
2021-05-03    248
2021-05-02    248
2021-05-01    248
2021-04-29    248
2021-04-19    248
2021-04-28    248
2021-04-27    248
2021-04-26    248
2021-04-25    248
2021-04-24    248
2021-04-23    248
2021-04-22    248
2021-04-21    248
2021-04-20    248
2021-03-25    248
2021-03-23    248
2021-05-11    248
2021-03-22    248
2021-02-25    248
2021-02-24    248
2021-02-23    248
2021-02-22    248
2021-02-21    248
2021-02-20    248
2021-02-19    248
2021-02-18    248
2021-02-17    248
2021-02-16    248
2021-02-15    248
2021-02-14    248
2021-02-13    248
2021-02-12    248
2021-02-11    248
2021-02-10    248
2021-02-09    248
2021-02-08    248
2021-08-15    248
2023-04-02    248
2023-04-03    248
2021-02-26    248
2021-02-27    248
2021-02-28    248
2021-03-12    248
2021-03-21    248
2021-03-20    248
2021-03-19    248
2021-03-18    248
2021-03-17    248
2021-03-16    248
2021-03-15    248
2021-03-14    248
2021-03-13    248
2021-03-11    248
2021-03-01    248
2021-03-10    248
2021-03-09    248
2021-03-08    248
2021-03-07    248
2021-03-06    248
2021-03-05    248
2021-03-04    248
2021-03-03    248
2021-03-02    248
2021-05-10    248
2023-03-30    248
2021-05-12    248
2021-06-29    248
2021-07-21    248
2021-07-20    248
2021-07-19    248
2021-07-18    248
2021-07-17    248
2021-07-16    248
2021-07-15    248
2021-07-14    248
2021-07-13    248
2021-07-12    248
2021-07-11    248
2021-07-10    248
2021-07-09    248
2021-07-08    248
2021-07-07    248
2021-07-06    248
2021-07-05    248
2021-07-04    248
2021-07-03    248
2021-07-02    248
2021-07-01    248
2021-07-22    248
2021-07-23    248
2021-07-24    248
2021-08-06    248
2021-08-12    248
2021-05-13    248
2021-08-13    248
2021-08-14    248
2021-08-11    248
2021-08-10    248
2021-08-09    248
2021-08-08    248
2021-08-07    248
2021-08-04    248
2021-07-25    248
2021-08-03    248
2021-08-02    248
2021-08-01    248
2021-07-31    248
2021-07-30    248
2021-07-29    248
2021-07-28    248
2021-07-27    248
2021-07-26    248
2021-06-30    248
2021-08-05    248
2021-06-28    248
2021-05-24    248
2021-06-02    248
2021-06-01    248
2021-05-31    248
2021-05-30    248
2021-05-29    248
2021-05-28    248
2021-05-27    248
2021-05-26    248
2021-05-25    248
2021-05-23    248
2021-06-04    248
2021-05-22    248
2021-05-21    248
2021-05-20    248
2021-05-18    248
2021-05-17    248
2021-05-16    248
2021-05-15    248
2021-06-27    248
2021-05-14    248
2021-06-03    248
2021-05-19    248
2021-06-05    248
2021-06-25    248
2021-06-26    248
2021-06-24    248
2021-06-23    248
2021-06-22    248
2021-06-21    248
2021-06-20    248
2021-06-19    248
2021-06-06    248
2021-06-17    248
2021-06-18    248
2021-06-16    248
2021-06-15    248
2021-06-14    248
2021-06-13    248
2021-06-12    248
2021-06-11    248
2021-06-10    248
2021-06-09    248
2021-06-08    248
2021-06-07    248
2020-07-08    247
2020-07-09    247
2020-07-10    247
2020-07-11    247
2020-07-12    247
2020-07-13    247
2020-07-20    247
2020-07-14    247
2020-07-15    247
2020-07-16    247
2020-07-17    247
2020-07-18    247
2020-07-19    247
2020-07-06    247
2020-07-21    247
2020-07-22    247
2020-07-07    247
2020-06-26    247
2020-07-05    247
2020-06-24    247
2020-06-18    247
2020-07-24    247
2020-06-19    247
2020-06-20    247
2020-06-21    247
2020-06-22    247
2020-06-23    247
2020-06-25    247
2020-07-04    247
2020-06-27    247
2020-06-28    247
2020-06-29    247
2020-06-30    247
2020-07-01    247
2020-07-02    247
2020-07-03    247
2020-07-23    247
2020-08-15    247
2020-07-25    247
2020-08-23    247
2020-08-16    247
2020-08-17    247
2020-08-18    247
2020-08-19    247
2020-08-20    247
2020-08-21    247
2020-08-22    247
2020-08-24    247
2020-07-26    247
2020-08-25    247
2020-08-26    247
2020-08-27    247
2020-08-28    247
2020-08-29    247
2020-08-30    247
2020-06-16    247
2020-08-14    247
2020-08-13    247
2020-08-12    247
2020-08-11    247
2020-07-27    247
2020-07-28    247
2020-07-29    247
2020-07-30    247
2020-07-31    247
2020-08-01    247
2020-08-02    247
2020-08-03    247
2020-08-04    247
2020-08-05    247
2020-08-06    247
2020-08-07    247
2020-08-08    247
2020-08-09    247
2020-08-10    247
2020-06-17    247
2020-04-30    247
2020-06-15    247
2020-04-26    247
2020-04-19    247
2020-04-20    247
2020-04-21    247
2020-04-22    247
2020-04-23    247
2020-04-24    247
2020-04-25    247
2020-04-27    247
2020-04-17    247
2020-04-28    247
2020-04-29    247
2020-09-01    247
2020-05-01    247
2020-05-02    247
2020-05-03    247
2020-05-04    247
2020-04-18    247
2020-04-16    247
2020-06-14    247
2020-04-06    247
2023-04-05    247
2023-04-04    247
2020-04-01    247
2020-04-02    247
2020-04-03    247
2020-04-04    247
2020-04-05    247
2020-04-07    247
2020-04-15    247
2020-04-08    247
2020-04-09    247
2020-04-10    247
2020-04-11    247
2020-04-12    247
2020-04-13    247
2020-04-14    247
2020-05-05    247
2020-05-06    247
2020-05-07    247
2020-06-05    247
2020-05-29    247
2020-05-30    247
2020-05-31    247
2020-06-01    247
2020-06-02    247
2020-06-03    247
2020-06-04    247
2020-06-06    247
2020-05-08    247
2020-06-07    247
2020-06-08    247
2020-06-09    247
2020-06-10    247
2020-06-11    247
2020-06-12    247
2020-06-13    247
2020-05-28    247
2020-05-27    247
2020-05-26    247
2020-05-25    247
2020-05-09    247
2020-05-10    247
2020-05-11    247
2020-05-12    247
2020-05-13    247
2020-05-14    247
2020-05-16    247
2020-05-17    247
2020-05-18    247
2020-05-19    247
2020-05-20    247
2020-05-21    247
2020-05-22    247
2020-05-23    247
2020-05-24    247
2020-08-31    247
2020-09-06    247
2020-09-02    247
2020-12-21    247
2020-12-14    247
2020-12-15    247
2020-12-16    247
2020-12-17    247
2020-12-18    247
2020-12-19    247
2020-12-20    247
2020-12-22    247
2020-12-12    247
2020-12-23    247
2020-12-24    247
2020-12-25    247
2020-12-26    247
2020-12-27    247
2020-12-28    247
2020-12-29    247
2020-12-13    247
2020-12-11    247
2020-12-31    247
2020-12-01    247
2020-11-24    247
2020-11-25    247
2020-11-26    247
2020-11-27    247
2020-11-28    247
2020-11-29    247
2020-11-30    247
2020-12-02    247
2020-12-10    247
2020-12-03    247
2020-12-04    247
2020-12-05    247
2020-12-06    247
2020-12-07    247
2020-12-08    247
2020-12-09    247
2020-12-30    247
2021-01-01    247
2020-11-22    247
2021-01-30    247
2021-01-23    247
2021-01-24    247
2021-01-25    247
2021-01-26    247
2021-01-27    247
2021-01-28    247
2021-01-29    247
2021-01-31    247
2021-01-21    247
2021-02-01    247
2021-02-02    247
2021-02-03    247
2021-02-04    247
2021-02-05    247
2021-02-06    247
2021-02-07    247
2021-01-22    247
2021-01-20    247
2020-09-03    247
2021-01-10    247
2021-01-03    247
2021-01-04    247
2021-01-05    247
2021-01-06    247
2021-01-07    247
2021-01-08    247
2021-01-09    247
2021-01-11    247
2021-01-19    247
2021-01-12    247
2021-01-13    247
2021-01-14    247
2021-01-15    247
2021-01-16    247
2021-01-17    247
2021-01-18    247
2020-11-23    247
2021-01-02    247
2020-11-21    247
2020-10-12    247
2020-09-24    247
2020-09-25    247
2020-09-26    247
2020-09-27    247
2020-09-28    247
2020-09-29    247
2020-09-30    247
2020-10-01    247
2020-10-02    247
2020-10-03    247
2020-10-04    247
2020-10-05    247
2020-10-06    247
2020-10-07    247
2020-10-08    247
2020-10-09    247
2020-10-10    247
2020-09-23    247
2020-09-22    247
2020-09-21    247
2020-09-11    247
2020-09-04    247
2020-09-05    247
2020-09-07    247
2020-09-08    247
2020-11-20    247
2020-09-09    247
2020-09-10    247
2020-09-12    247
2020-09-20    247
2020-09-13    247
2020-09-14    247
2020-09-15    247
2020-09-16    247
2020-09-17    247
2020-09-18    247
2020-09-19    247
2020-10-11    247
2020-10-25    247
2020-10-13    247
2020-11-01    247
2020-11-04    247
2020-11-05    247
2020-11-06    247
2020-11-07    247
2020-11-08    247
2020-11-09    247
2020-11-10    247
2020-11-11    247
2020-11-12    247
2020-11-13    247
2020-11-14    247
2020-11-18    247
2020-11-17    247
2020-11-16    247
2020-10-14    247
2020-11-03    247
2020-11-02    247
2020-10-31    247
2020-11-19    247
2020-10-15    247
2020-10-16    247
2020-10-17    247
2020-10-18    247
2020-10-19    247
2020-10-20    247
2020-10-22    247
2020-10-21    247
2020-10-23    247
2020-10-24    247
2020-10-26    247
2020-10-27    247
2020-10-28    247
2020-10-29    247
2020-10-30    247
2020-11-15    247
2020-03-23    246
2020-03-20    246
2020-03-21    246
2020-03-22    246
2020-03-24    246
2020-03-25    246
2020-03-26    246
2020-03-30    246
2020-03-28    246
2020-03-31    246
2020-05-15    246
2020-03-29    246
2020-03-27    246
2020-03-15    245
2020-03-14    245
2020-03-13    245
2020-03-16    245
2020-03-11    245
2020-03-10    245
2020-03-09    245
2020-03-08    245
2020-03-07    245
2020-03-17    245
2020-03-18    245
2020-03-19    245
2020-03-12    245
2020-03-04    244
2020-03-01    244
2020-03-06    244
2020-03-05    244
2020-03-03    244
2020-03-02    244
2020-02-29    243
2020-02-02    243
2020-02-07    243
2020-02-06    243
2020-02-05    243
2020-02-04    243
2020-02-03    243
2020-01-31    243
2020-02-28    243
2020-02-09    243
2023-04-06    243
2023-04-07    243
2023-04-08    243
2023-04-09    243
2020-02-08    243
2020-02-01    243
2020-02-10    243
2020-02-11    243
2020-02-27    243
2020-02-26    243
2020-02-25    243
2020-02-24    243
2020-02-23    243
2020-02-22    243
2020-02-21    243
2020-02-20    243
2020-02-12    243
2020-02-13    243
2020-02-14    243
2020-02-15    243
2020-02-16    243
2020-02-17    243
2020-02-18    243
2020-02-19    243
2020-01-28    242
2020-01-30    242
2023-04-12    242
2023-04-11    242
2023-04-10    242
2020-01-16    242
2020-01-27    242
2020-01-17    242
2020-01-18    242
2020-01-19    242
2020-01-20    242
2020-01-29    242
2020-01-22    242
2020-01-23    242
2020-01-24    242
2020-01-25    242
2020-01-26    242
2020-01-21    242
2020-01-03    241
2020-01-04    241
2020-01-15    241
2020-01-14    241
2020-01-13    241
2020-01-12    241
2020-01-11    241
2020-01-10    241
2020-01-09    241
2020-01-08    241
2020-01-07    241
2020-01-06    241
2020-01-05    241
2020-01-01      2
2020-01-02      2
Name: count, dtype: int64
--------------------------------------------------
Column: tests_units
tests_units
tests performed    216495
people tested       52600
samples tested      25520
units unclear        1225
Name: count, dtype: int64
--------------------------------------------------

To solve inconsistencies and outliers in the categorical data, we applyed the following steps:

Steps to Clean the Data:

  1. Remove Outliers:
    • If OWID_CYN and ESH are not valid iso_code values, we can drop them.
  2. Fix Inconsistencies in location:
    • location should contain only country/region names, but values like "Lower middle income" and "Africa" indicate misclassified data.
    • We can filter out non-country values.
  3. Handle Missing or Incorrect Values:
    • If some categories are missing or have typos, we can correct or remove them.

What This Code Does:

βœ… Removes outlier iso_code values (OWID_CYN and ESH).
βœ… Filters out incorrect location values (like "Lower middle income", "Africa", etc.).
βœ… Ensures only valid country/region names remain in location.

πŸ“Œ Step 8: Data Type Conversion & Column Renaming

df['date'] = pd.to_datetime(df['date'], errors='coerce')

Explanation:

βœ… pd.to_datetime(df['date']) converts the column to a proper datetime format.
βœ… errors='coerce' ensures that any invalid values (e.g., wrong formats) become NaT (Not a Time), preventing errors.

# Rename columns
df = df.rename(columns={
    'iso_code': 'country_code',
    'location': 'country',
    'total_cases': 'total_confirmed_cases',
    'new_cases': 'new_confirmed_cases',
    'total_deaths': 'total_deaths_reported',
    'new_deaths': 'new_deaths_reported'
})

Rename columns for better readability and consistency.

df.info()
<class 'pandas.core.frame.DataFrame'>
Index: 295840 entries, 0 to 302511
Data columns (total 67 columns):
 #   Column                                      Non-Null Count   Dtype         
---  ------                                      --------------   -----         
 0   country_code                                295840 non-null  object        
 1   continent                                   295840 non-null  object        
 2   country                                     295840 non-null  object        
 3   date                                        295840 non-null  datetime64[ns]
 4   total_confirmed_cases                       295840 non-null  float64       
 5   new_confirmed_cases                         295840 non-null  float64       
 6   new_cases_smoothed                          295840 non-null  float64       
 7   total_deaths_reported                       295840 non-null  float64       
 8   new_deaths_reported                         295840 non-null  float64       
 9   new_deaths_smoothed                         295840 non-null  float64       
 10  total_cases_per_million                     295840 non-null  float64       
 11  new_cases_per_million                       295840 non-null  float64       
 12  new_cases_smoothed_per_million              295840 non-null  float64       
 13  total_deaths_per_million                    295840 non-null  float64       
 14  new_deaths_per_million                      295840 non-null  float64       
 15  new_deaths_smoothed_per_million             295840 non-null  float64       
 16  reproduction_rate                           295840 non-null  float64       
 17  icu_patients                                295840 non-null  float64       
 18  icu_patients_per_million                    295840 non-null  float64       
 19  hosp_patients                               295840 non-null  float64       
 20  hosp_patients_per_million                   295840 non-null  float64       
 21  weekly_icu_admissions                       295840 non-null  float64       
 22  weekly_icu_admissions_per_million           295840 non-null  float64       
 23  weekly_hosp_admissions                      295840 non-null  float64       
 24  weekly_hosp_admissions_per_million          295840 non-null  float64       
 25  total_tests                                 295840 non-null  float64       
 26  new_tests                                   295840 non-null  float64       
 27  total_tests_per_thousand                    295840 non-null  float64       
 28  new_tests_per_thousand                      295840 non-null  float64       
 29  new_tests_smoothed                          295840 non-null  float64       
 30  new_tests_smoothed_per_thousand             295840 non-null  float64       
 31  positive_rate                               295840 non-null  float64       
 32  tests_per_case                              295840 non-null  float64       
 33  tests_units                                 295840 non-null  object        
 34  total_vaccinations                          295840 non-null  float64       
 35  people_vaccinated                           295840 non-null  float64       
 36  people_fully_vaccinated                     295840 non-null  float64       
 37  total_boosters                              295840 non-null  float64       
 38  new_vaccinations                            295840 non-null  float64       
 39  new_vaccinations_smoothed                   295840 non-null  float64       
 40  total_vaccinations_per_hundred              295840 non-null  float64       
 41  people_vaccinated_per_hundred               295840 non-null  float64       
 42  people_fully_vaccinated_per_hundred         295840 non-null  float64       
 43  total_boosters_per_hundred                  295840 non-null  float64       
 44  new_vaccinations_smoothed_per_million       295840 non-null  float64       
 45  new_people_vaccinated_smoothed              295840 non-null  float64       
 46  new_people_vaccinated_smoothed_per_hundred  295840 non-null  float64       
 47  stringency_index                            295840 non-null  float64       
 48  population_density                          295840 non-null  float64       
 49  median_age                                  295840 non-null  float64       
 50  aged_65_older                               295840 non-null  float64       
 51  aged_70_older                               295840 non-null  float64       
 52  gdp_per_capita                              295840 non-null  float64       
 53  extreme_poverty                             295840 non-null  float64       
 54  cardiovasc_death_rate                       295840 non-null  float64       
 55  diabetes_prevalence                         295840 non-null  float64       
 56  female_smokers                              295840 non-null  float64       
 57  male_smokers                                295840 non-null  float64       
 58  handwashing_facilities                      295840 non-null  float64       
 59  hospital_beds_per_thousand                  295840 non-null  float64       
 60  life_expectancy                             295840 non-null  float64       
 61  human_development_index                     295840 non-null  float64       
 62  population                                  295840 non-null  float64       
 63  excess_mortality_cumulative_absolute        295840 non-null  float64       
 64  excess_mortality_cumulative                 295840 non-null  float64       
 65  excess_mortality                            295840 non-null  float64       
 66  excess_mortality_cumulative_per_million     295840 non-null  float64       
dtypes: datetime64[ns](1), float64(62), object(4)
memory usage: 153.5+ MB

πŸ“Œ Step 9: Detecting & Handling Outliers


# Select numerical columns
numerical_cols = df.select_dtypes(include=[np.number]).columns
n_cols = len(numerical_cols)
n_rows = math.ceil(n_cols / 4)  # Adjust the number of rows based on the number of columns

plt.figure(figsize=(20, 5 * n_rows))  # Adjust figure size based on the number of rows
for i, col in enumerate(numerical_cols, 1):
    plt.subplot(n_rows, 4, i)
    sns.boxplot(y=df[col])
    plt.title(col)
plt.tight_layout()
plt.show()

from scipy.stats import zscore

# Calculate Z-scores for numerical columns
z_scores = df[numerical_cols].apply(zscore)

# Define a threshold (e.g., 3 or -3)
threshold = 3
outliers_z = (z_scores > threshold) | (z_scores < -threshold)

# Count outliers in each column
print("Outliers detected using Z-Score:")
print(outliers_z.sum())
Outliers detected using Z-Score:
total_confirmed_cases                         4269
new_confirmed_cases                           1569
new_cases_smoothed                            1564
total_deaths_reported                         6123
new_deaths_reported                           4001
new_deaths_smoothed                           4106
total_cases_per_million                       6690
new_cases_per_million                         1866
new_cases_smoothed_per_million                3621
total_deaths_per_million                      6286
new_deaths_per_million                        3053
new_deaths_smoothed_per_million               5349
reproduction_rate                             1113
icu_patients                                  2119
icu_patients_per_million                      6180
hosp_patients                                 2046
hosp_patients_per_million                     4079
weekly_icu_admissions                         2401
weekly_icu_admissions_per_million             2963
weekly_hosp_admissions                        9758
weekly_hosp_admissions_per_million            3832
total_tests                                    518
new_tests                                     2759
total_tests_per_thousand                      5848
new_tests_per_thousand                        5898
new_tests_smoothed                            1149
new_tests_smoothed_per_thousand               3184
positive_rate                                 7901
tests_per_case                                 589
total_vaccinations                            3632
people_vaccinated                             3362
people_fully_vaccinated                       3411
total_boosters                                4640
new_vaccinations                              1790
new_vaccinations_smoothed                     1756
total_vaccinations_per_hundred                1058
people_vaccinated_per_hundred                    0
people_fully_vaccinated_per_hundred              0
total_boosters_per_hundred                    3405
new_vaccinations_smoothed_per_million         6337
new_people_vaccinated_smoothed                1640
new_people_vaccinated_smoothed_per_hundred    5740
stringency_index                                 0
population_density                            4344
median_age                                       0
aged_65_older                                 2392
aged_70_older                                 2392
gdp_per_capita                                5571
extreme_poverty                               2392
cardiovasc_death_rate                         1196
diabetes_prevalence                           4784
female_smokers                                   0
male_smokers                                  2392
handwashing_facilities                           0
hospital_beds_per_thousand                    8372
life_expectancy                                  0
human_development_index                          0
population                                    4784
excess_mortality_cumulative_absolute          8036
excess_mortality_cumulative                   3854
excess_mortality                              5019
excess_mortality_cumulative_per_million       5224
dtype: int64
# Apply winsorization to cap extreme values
df[numerical_cols] = df[numerical_cols].apply(lambda x: winsorize(x, limits=[0.01, 0.01]))  # Caps top & bottom 1%

Should we Remove Outliers?

πŸ”Ή Keep Outliers If:

πŸ”Ή Remove Outliers If:


How to Handle Outliers?

If We decide to deal with outliers, here are the best approaches for our dataset:

βœ… 1. Winsorization (Capping Outliers)

βœ… 2. Transforming Data (Log or Square Root Transform)

βœ… 3. Removing Outliers Based on Z-Score or IQR


Best Approach for our Data

Since our dataset includes real-world COVID-19 data, the best approach is to use Winsorization or log transformation rather than direct removal because:
1️⃣ COVID-19 cases & deaths have natural extreme spikes.
2️⃣ Removing outliers may hide crucial trends (waves, lockdown effects).
3️⃣ Smoothing extreme values is better than removing them entirely.




# Calculate Z-scores for numerical columns
z_scores = df[numerical_cols].apply(zscore)

# Define a threshold (e.g., 3 or -3)
threshold = 3
outliers_z = (z_scores > threshold) | (z_scores < -threshold)

# Count outliers in each column
print("Outliers detected using Z-Score:")
print(outliers_z.sum())
Outliers detected using Z-Score:
total_confirmed_cases                          5337
new_confirmed_cases                            6543
new_cases_smoothed                             6702
total_deaths_reported                          7215
new_deaths_reported                            6627
new_deaths_smoothed                            6567
total_cases_per_million                        6869
new_cases_per_million                          7602
new_cases_smoothed_per_million                 7674
total_deaths_per_million                       6592
new_deaths_per_million                         8613
new_deaths_smoothed_per_million                9056
reproduction_rate                                 0
icu_patients                                  12520
icu_patients_per_million                       7859
hosp_patients                                     0
hosp_patients_per_million                      5069
weekly_icu_admissions                          5435
weekly_icu_admissions_per_million              4564
weekly_hosp_admissions                        10595
weekly_hosp_admissions_per_million             4741
total_tests                                    7934
new_tests                                      8230
total_tests_per_thousand                       6546
new_tests_per_thousand                         6563
new_tests_smoothed                             6676
new_tests_smoothed_per_thousand                7098
positive_rate                                  7901
tests_per_case                                 4863
total_vaccinations                             5208
people_vaccinated                              4846
people_fully_vaccinated                        4798
total_boosters                                 5211
new_vaccinations                               6333
new_vaccinations_smoothed                      6228
total_vaccinations_per_hundred                    0
people_vaccinated_per_hundred                     0
people_fully_vaccinated_per_hundred               0
total_boosters_per_hundred                     3415
new_vaccinations_smoothed_per_million          7593
new_people_vaccinated_smoothed                 6022
new_people_vaccinated_smoothed_per_hundred     9025
stringency_index                                  0
population_density                             5540
median_age                                        0
aged_65_older                                     0
aged_70_older                                     0
gdp_per_capita                                 5571
extreme_poverty                                   0
cardiovasc_death_rate                             0
diabetes_prevalence                            4784
female_smokers                                    0
male_smokers                                      0
handwashing_facilities                            0
hospital_beds_per_thousand                     8372
life_expectancy                                   0
human_development_index                           0
population                                     5980
excess_mortality_cumulative_absolute           8036
excess_mortality_cumulative                    3678
excess_mortality                               7339
excess_mortality_cumulative_per_million        5224
dtype: int64

βœ… We Solve the problem of Outliers we are good to go.

πŸ“Œ Step 10: Exploratory Data Analysis (EDA)

df.describe(include='all')
country_code continent country date total_confirmed_cases new_confirmed_cases new_cases_smoothed total_deaths_reported new_deaths_reported new_deaths_smoothed total_cases_per_million new_cases_per_million new_cases_smoothed_per_million total_deaths_per_million new_deaths_per_million new_deaths_smoothed_per_million reproduction_rate icu_patients icu_patients_per_million hosp_patients hosp_patients_per_million weekly_icu_admissions weekly_icu_admissions_per_million weekly_hosp_admissions weekly_hosp_admissions_per_million total_tests new_tests total_tests_per_thousand new_tests_per_thousand new_tests_smoothed new_tests_smoothed_per_thousand positive_rate tests_per_case tests_units total_vaccinations people_vaccinated people_fully_vaccinated total_boosters new_vaccinations new_vaccinations_smoothed total_vaccinations_per_hundred people_vaccinated_per_hundred people_fully_vaccinated_per_hundred total_boosters_per_hundred new_vaccinations_smoothed_per_million new_people_vaccinated_smoothed new_people_vaccinated_smoothed_per_hundred stringency_index population_density median_age aged_65_older aged_70_older gdp_per_capita extreme_poverty cardiovasc_death_rate diabetes_prevalence female_smokers male_smokers handwashing_facilities hospital_beds_per_thousand life_expectancy human_development_index population excess_mortality_cumulative_absolute excess_mortality_cumulative excess_mortality excess_mortality_cumulative_per_million
count 295840 295840 295840 295840 2.958400e+05 295840.000000 295840.000000 2.958400e+05 295840.000000 295840.000000 295840.000000 295840.000000 295840.000000 295840.000000 295840.000000 295840.000000 295840.000000 295840.000000 295840.000000 295840.000000 295840.000000 295840.000000 295840.000000 295840.000000 295840.000000 2.958400e+05 295840.000000 295840.000000 295840.000000 295840.000000 295840.000000 295840.000000 295840.000000 295840 2.958400e+05 2.958400e+05 2.958400e+05 2.958400e+05 2.958400e+05 2.958400e+05 295840.000000 295840.000000 295840.000000 295840.000000 295840.00000 295840.000000 295840.000000 295840.000000 295840.000000 295840.000000 295840.000000 295840.000000 295840.000000 295840.000000 295840.000000 295840.000000 295840.000000 295840.000000 295840.000000 295840.000000 295840.000000 295840.000000 2.958400e+05 2.958400e+05 295840.000000 295840.000000 295840.000000
unique 248 6 248 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 4 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
top ARG Africa Argentina NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN tests performed NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
freq 1198 71760 1198 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 216495 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
mean NaN NaN NaN 2021-08-23 13:55:08.729042944 3.294126e+06 3635.738247 3704.414733 4.774526e+04 37.930527 38.733286 92596.267365 128.229897 140.757851 848.157616 0.884128 0.945825 0.752354 219.250882 5.212657 2984.668493 80.188060 72.420673 3.708227 1522.244355 53.871693 2.249765e+07 28121.147739 1249.748769 2.355675 33146.061060 1.784718 0.142600 132.981153 NaN 9.991498e+07 4.187658e+07 3.890271e+07 2.646347e+07 7.137484e+04 5.885654e+04 123.097460 53.530107 49.582787 33.543533 1525.56921 20538.642090 0.054311 34.424177 308.207316 30.034724 8.514619 5.268092 19076.362702 14.234671 263.110822 8.600771 11.395997 31.541347 49.450266 3.068764 73.646679 0.719877 5.323516e+07 6.957121e+04 13.511943 10.712484 2233.221401
min NaN NaN NaN 2020-01-01 00:00:00 4.000000e+00 0.000000 0.000000 1.000000e+00 0.000000 0.000000 1.017000 0.000000 0.000000 0.321000 0.000000 0.000000 -0.010000 0.000000 0.000000 22.000000 6.562000 1.000000 0.552000 5.000000 1.893000 7.930000e+02 8.000000 0.497000 0.003000 14.000000 0.006000 0.000000 1.100000 NaN 1.970000e+02 5.490000e+02 1.150000e+03 4.000000e+01 0.000000e+00 0.000000e+00 0.040000 0.040000 0.070000 0.000000 0.00000 0.000000 0.000000 0.000000 3.078000 16.400000 1.307000 1.114000 752.788000 0.100000 85.755000 1.910000 0.200000 8.500000 1.188000 0.300000 54.330000 0.398000 1.893000e+03 -4.658800e+03 -10.340000 -36.510000 -1034.877900
25% NaN NaN NaN 2020-10-29 00:00:00 7.342750e+03 0.000000 0.714000 1.230000e+02 0.000000 0.000000 2492.635000 0.000000 0.177000 57.555000 0.000000 0.000000 0.380000 8.000000 1.423000 385.000000 26.645000 7.000000 1.434000 162.000000 23.683000 3.950400e+05 1278.000000 68.836750 0.222000 598.750000 0.124000 0.016100 4.800000 NaN 3.800000e+05 2.052590e+05 1.848010e+05 6.133800e+04 5.220000e+02 9.100000e+01 48.210000 31.070000 27.120000 6.330000 42.00000 10.000000 0.000000 11.110000 39.497000 21.700000 3.526000 2.063000 4227.630000 0.600000 176.957000 5.460000 1.700000 21.000000 19.351000 1.300000 69.020000 0.594000 4.099890e+05 2.317750e+02 6.110000 -5.950000 651.846070
50% NaN NaN NaN 2021-08-24 00:00:00 6.659900e+04 14.000000 33.286000 1.325000e+03 0.000000 0.143000 24861.928000 2.166000 9.319000 389.266000 0.000000 0.024000 0.850000 29.000000 2.844000 550.000000 55.594000 30.000000 1.619000 515.000000 41.806000 2.870685e+06 4118.000000 411.155000 0.613000 3338.000000 0.545000 0.080300 12.100000 NaN 3.739158e+06 2.239434e+06 1.993200e+06 7.798590e+05 5.221000e+03 1.313000e+03 120.180000 59.710000 54.120000 28.660000 295.00000 231.000000 0.005000 27.500000 99.110000 29.100000 6.211000 3.519000 12895.635000 2.200000 243.811000 7.200000 6.300000 30.200000 44.600000 2.400000 75.000000 0.738000 5.637022e+06 5.963201e+03 13.190000 2.890000 1753.307300
75% NaN NaN NaN 2022-06-18 00:00:00 6.305112e+05 439.000000 527.857000 9.854000e+03 5.000000 5.429000 118749.429000 68.085500 102.821750 1309.040000 0.389000 0.723000 1.050000 165.000000 8.174000 3350.000000 113.283750 77.000000 3.797000 1291.000000 67.691000 1.234731e+07 19244.000000 1433.869000 1.983000 15032.750000 1.729000 0.196200 50.000000 NaN 2.276167e+07 1.067369e+07 9.792266e+06 6.936989e+06 3.515300e+04 1.670425e+04 193.410000 77.600000 71.980000 54.670000 1653.00000 4137.000000 0.045000 50.970000 237.012000 38.700000 13.260000 8.160000 26808.164000 23.500000 336.717000 10.790000 20.100000 41.100000 82.502000 4.000000 79.190000 0.828000 2.620798e+07 4.980240e+04 20.400000 15.210000 3305.707800
max NaN NaN NaN 2023-04-12 00:00:00 1.165863e+08 125143.000000 122485.857000 1.398618e+06 1287.000000 1294.143000 599142.341000 2602.864000 2416.903000 4617.392000 15.549000 12.994000 1.770000 2089.000000 55.417000 14975.000000 409.450000 464.000000 20.936000 14366.000000 221.355000 3.926741e+08 409271.000000 14707.401000 39.495000 699733.000000 22.221000 0.952300 4566.400000 NaN 3.415934e+09 1.302773e+09 1.272830e+09 8.283965e+08 1.732373e+06 1.700593e+06 298.090000 105.820000 105.820000 128.220000 15745.00000 709346.000000 0.694000 90.740000 7915.731000 47.900000 23.021000 16.240000 104861.851000 71.700000 597.029000 27.250000 43.000000 65.800000 100.000000 13.050000 84.860000 0.955000 1.425887e+09 1.282260e+06 51.740000 166.230000 10066.715000
std NaN NaN NaN NaN 1.433953e+07 15945.612323 15797.481707 1.929127e+05 164.719089 165.323935 140934.885507 372.025550 356.389151 1058.946871 2.427710 2.188158 0.445604 407.447529 8.071772 4510.269729 74.258545 90.221415 3.894764 2997.028427 40.987913 5.856635e+07 65431.507862 2320.403624 5.735614 92938.518533 3.362456 0.184563 571.393907 NaN 4.142194e+08 1.651486e+08 1.583957e+08 1.073620e+08 2.258135e+05 2.209618e+05 83.971049 29.004464 28.092447 29.342011 2766.29427 87392.760171 0.119391 23.759028 959.414433 9.029077 6.005725 3.999579 20207.929852 20.104251 121.463806 4.963716 11.459313 13.467883 32.232231 2.541954 7.297570 0.147177 1.939979e+08 1.981024e+05 11.517135 30.179084 2226.839542

πŸ“Œ Step 11: Data Visualization & Key Insights

COVID-19 Cases by Continent and Country

fig = px.sunburst(df,
                  path=["continent", "country"],
                  values="total_confirmed_cases",
                  color="total_deaths_reported",
                  title="🌞 COVID-19 Spread Across Continents & Countries")

fig.show()

COVID-19 Metrics by Country

# Select a subset of columns for the Parallel Coordinates Chart
df_parallel = df[['country', 'total_confirmed_cases', 'total_deaths_reported', 'total_vaccinations', 'population']]

# Create Parallel Coordinates Chart
fig_parallel = px.parallel_coordinates(df_parallel, color='total_confirmed_cases',
                                       title='Parallel Coordinates: COVID-19 Metrics by Country',
                                       labels={'total_confirmed_cases': 'Total Cases', 'total_deaths_reported': 'Total Deaths', 'total_vaccinations': 'Total Vaccinations', 'population': 'Population'},
                                       template='plotly_dark')

# Show the plot
fig_parallel.show()
Output hidden; open in https://colab.research.google.com to view.

Cases, Deaths, and Vaccinations

# Create 3D Scatter Plot
fig_3d = px.scatter_3d(df, x='total_confirmed_cases', y='total_deaths_reported', z='total_vaccinations',
                       color='continent', title='3D Scatter Plot: Cases, Deaths, and Vaccinations',
                       labels={'total_confirmed_cases': 'Total Cases', 'total_deaths_reported': 'Total Deaths', 'total_vaccinations': 'Total Vaccinations'},
                       template='plotly_dark')

# Show the plot
fig_3d.show()
Output hidden; open in https://colab.research.google.com to view.

Animated Bubble Chart: Cases and Deaths Over Time

# Create Animated Bubble Chart
fig_bubble = px.scatter(df, x='total_confirmed_cases', y='total_deaths_reported', size='population',
                        color='continent', animation_frame=df['date'].dt.strftime('%Y-%m-%d'),
                        title='Animated Bubble Chart: Cases and Deaths Over Time',
                        labels={'total_confirmed_cases': 'Total Cases', 'total_deaths_reported': 'Total Deaths', 'population': 'Population'},
                        template='plotly_dark')

# Adjust animation speed
fig_bubble.layout.updatemenus[0].buttons[0].args[1]['frame']['duration'] = 100
fig_bubble.layout.updatemenus[0].buttons[0].args[1]['transition']['duration'] = 50

# Show the plot
fig_bubble.show()
Output hidden; open in https://colab.research.google.com to view.

Cases vs. Deaths vs. Population Bubble Chart

fig = px.scatter(df,
                 x="total_confirmed_cases",
                 y="total_deaths_reported",
                 size="population",
                 color="continent",
                 hover_name="country",
                 title="⚑ Cases vs. Deaths vs. Population Bubble Chart")

fig.show()
Output hidden; open in https://colab.research.google.com to view.

COVID-19 Testing Rate Per Thousand

fig = px.choropleth(df,
                    locations="country",
                    locationmode="country names",
                    color="new_tests_per_thousand",
                    hover_name="country",
                    animation_frame=df['date'].astype(str),
                    title="πŸ§ͺ COVID-19 Testing Rate Per Thousand",
                    color_continuous_scale="Purples")

fig.show()

COVID-19 Testing & Positivity Rate

fig = px.treemap(df,
                 path=["continent", "country"],
                 values="total_tests",
                 color="positive_rate",
                 title="πŸ§ͺ COVID-19 Testing & Positivity Rate")

fig.show()

Daily & Cumulative Confirmed Cases Over Time

# Set seaborn style
sns.set(style="whitegrid")

###  Daily & Cumulative Confirmed Cases Over Time ###
plt.figure(figsize=(12, 6))
sns.lineplot(data=df, x='date', y='total_confirmed_cases', label='Total Confirmed Cases', color='blue')
sns.lineplot(data=df, x='date', y='new_confirmed_cases', label='New Cases', color='red')
plt.xlabel("Date")
plt.ylabel("Cases")
plt.title("COVID-19 Confirmed Cases Over Time")
plt.legend()
plt.xticks(rotation=45)
plt.show()

Top 10 Countries with Highest Cases

###  Top 10 Countries with Highest Cases ###
top_countries = df.groupby('country')['total_confirmed_cases'].max().sort_values(ascending=False).head(10)
plt.figure(figsize=(10, 5))
sns.barplot(x=top_countries.values, y=top_countries.index, palette="Reds_r")
plt.xlabel("Total Confirmed Cases")
plt.ylabel("Country")
plt.title("Top 10 Countries with Highest COVID-19 Cases")
plt.show()

Case Fatality Rate (CFR%) Trend

###  Case Fatality Rate (CFR%) Trend ###
df["CFR"] = (df["total_deaths_reported"] / df["total_confirmed_cases"]) * 100
plt.figure(figsize=(12, 6))
sns.lineplot(data=df, x='date', y='CFR', color='purple')
plt.xlabel("Date")
plt.ylabel("Case Fatality Rate (%)")
plt.title("COVID-19 Case Fatality Rate Over Time")
plt.show()

Vaccination Progress (Top 10 Countries)

###  Vaccination Progress (Top 10 Countries) ###
top_vaccine_countries = df.groupby('country')['total_vaccinations'].max().sort_values(ascending=False).head(10)
plt.figure(figsize=(10, 5))
sns.barplot(x=top_vaccine_countries.values, y=top_vaccine_countries.index, palette="Blues_r")
plt.xlabel("Total Vaccinations")
plt.ylabel("Country")
plt.title("Top 10 Countries by Total Vaccinations")
plt.show()

Top 10 Countries with Highest Cases - Interactive Bar Chart

### Top 10 Countries with Highest Cases - Interactive Bar Chart ###
top_countries = df.groupby('country')['total_confirmed_cases'].max().sort_values(ascending=False).head(10)

fig = px.bar(x=top_countries.values, y=top_countries.index,
             orientation='h',
             title="Top 10 Countries with Highest COVID-19 Cases",
             labels={'x': 'Total Confirmed Cases', 'y': 'Country'},
             color=top_countries.values, color_continuous_scale='Reds')

fig.update_layout(yaxis={'categoryorder': 'total ascending'})
fig.show()

Vaccination Progress - Top 10 Countries

###  Vaccination Progress - Top 10 Countries ###
top_vaccine_countries = df.groupby('country')['total_vaccinations'].max().sort_values(ascending=False).head(10)

fig = px.bar(x=top_vaccine_countries.values, y=top_vaccine_countries.index,
             orientation='h',
             title="Top 10 Countries by Total Vaccinations",
             labels={'x': 'Total Vaccinations', 'y': 'Country'},
             color=top_vaccine_countries.values, color_continuous_scale='Blues')

fig.update_layout(yaxis={'categoryorder': 'total ascending'})
fig.show()

Global COVID-19 Cases Over Time



# Create a choropleth map for total confirmed cases
fig = px.choropleth(df,
                    locations="country",
                    locationmode="country names",
                    color="total_confirmed_cases",
                    hover_name="country",
                    animation_frame=df['date'].astype(str),
                    title="Global COVID-19 Cases Over Time",
                    color_continuous_scale="Reds")

fig.show()
Output hidden; open in https://colab.research.google.com to view.

COVID-19 Cases Growth Over Time - Top 10 Countries


# Get top 10 affected countries
top_countries = df.groupby("country")["total_confirmed_cases"].max().nlargest(10).index
df_top = df[df["country"].isin(top_countries)]

# Create animated bar chart
fig = px.bar(df_top,
             x="total_confirmed_cases",
             y="country",
             color="country",
             animation_frame=df_top['date'].astype(str),
             title="πŸ“ˆ COVID-19 Cases Growth Over Time - Top 10 Countries",
             labels={'total_confirmed_cases': 'Total Cases', 'country': 'Country'},
             orientation='h')

fig.update_layout(yaxis={'categoryorder': 'total ascending'})
fig.show()
Output hidden; open in https://colab.research.google.com to view.

Global COVID-19 Vaccination Progress Over Time

fig = px.choropleth(df,
                    locations="country",
                    locationmode="country names",
                    color="people_fully_vaccinated_per_hundred",
                    hover_name="country",
                    animation_frame=df['date'].astype(str),
                    title="πŸ’‰ Global COVID-19 Vaccination Progress Over Time",
                    color_continuous_scale="Blues")

fig.show()
Output hidden; open in https://colab.research.google.com to view.
fig = px.choropleth(df,
                    locations="country",
                    locationmode="country names",
                    color="total_deaths_per_million",
                    hover_name="country",
                    animation_frame=df['date'].astype(str),
                    title="⚰️ COVID-19 Mortality Rate Per Million",
                    color_continuous_scale="OrRd")

fig.show()
Output hidden; open in https://colab.research.google.com to view.

Daily New COVID-19 Cases and Deaths

# Group by date and calculate daily new cases and deaths
df_daily = df.groupby('date').agg({
    'new_confirmed_cases': 'sum',
    'new_deaths_reported': 'sum'
}).reset_index()

# Plot daily new cases and deaths
plt.figure(figsize=(14, 6))
plt.plot(df_daily['date'], df_daily['new_confirmed_cases'], label='Daily New Cases')
plt.plot(df_daily['date'], df_daily['new_deaths_reported'], label='Daily New Deaths')
plt.title('Daily New COVID-19 Cases and Deaths')
plt.xlabel('Date')
plt.ylabel('Count')
plt.legend()
plt.grid()
plt.show()

Global COVID-19 Vaccination Progress

# Group by date and calculate total vaccinations
df_vaccination = df.groupby('date').agg({
    'total_vaccinations': 'sum',
    'people_vaccinated': 'sum',
    'people_fully_vaccinated': 'sum'
}).reset_index()

# Plot vaccination progress
plt.figure(figsize=(14, 6))
plt.plot(df_vaccination['date'], df_vaccination['total_vaccinations'], label='Total Vaccinations')
plt.plot(df_vaccination['date'], df_vaccination['people_vaccinated'], label='People Vaccinated')
plt.plot(df_vaccination['date'], df_vaccination['people_fully_vaccinated'], label='People Fully Vaccinated')
plt.title('Global COVID-19 Vaccination Progress')
plt.xlabel('Date')
plt.ylabel('Count')
plt.legend()
plt.grid()
plt.show()

Global COVID-19 Cases and Deaths Over Time

# Group by date and calculate total cases and deaths
df_grouped = df.groupby('date').agg({
    'total_confirmed_cases': 'sum',
    'total_deaths_reported': 'sum'
}).reset_index()

# Plot total cases and deaths over time
plt.figure(figsize=(14, 6))
plt.plot(df_grouped['date'], df_grouped['total_confirmed_cases'], label='Total Confirmed Cases')
plt.plot(df_grouped['date'], df_grouped['total_deaths_reported'], label='Total Deaths Reported')
plt.title('Global COVID-19 Cases and Deaths Over Time')
plt.xlabel('Date')
plt.ylabel('Count')
plt.legend()
plt.grid()
plt.show()

Total cases by continent and total deaths by continent

# Group by continent and calculate total cases and deaths
df_continent = df.groupby('continent').agg({
    'total_confirmed_cases': 'sum',
    'total_deaths_reported': 'sum'
}).reset_index()

# Plot total cases by continent
plt.figure(figsize=(14, 6))
sns.barplot(x='continent', y='total_confirmed_cases', data=df_continent, palette='coolwarm')
plt.title('Total COVID-19 Cases by Continent')
plt.xlabel('Continent')
plt.ylabel('Total Confirmed Cases')
plt.show()

# Plot total deaths by continent
plt.figure(figsize=(14, 6))
sns.barplot(x='continent', y='total_deaths_reported', data=df_continent, palette='coolwarm')
plt.title('Total COVID-19 Deaths by Continent')
plt.xlabel('Continent')
plt.ylabel('Total Deaths Reported')
plt.show()

πŸ“Œ Step 12: Exporting the Cleaned Dataset

df.to_csv('/content/drive/My Drive/Data Sets/cleaned_covid_data.csv', index=False)

After preprocessing, we saved the cleaned DataFrame as a new CSV.


πŸ“Œ Step 13: COVID-19 Data Analysis Report

Summary of Key Findings & Insights

πŸ“Œ Total Cases & Deaths:

πŸ“Œ Vaccination Impact:

πŸ“Œ Testing & Positive Rate:

πŸ“Œ ICU & Hospitalization Trends:


πŸ“Š Moving Averages (Week, Month, Quarter, Year):

πŸ“Š Seasonality & Waves:


Country-Level Insights

πŸ“Œ Top 5 Most Affected Countries:

πŸ“Œ Regional Differences in Mortality & Recovery:

πŸ“Œ Lockdown & Policy Impact:


Correlations & Key Insights

βœ… Cases vs. Testing: More testing leads to higher case detection, reducing underreporting risks.
βœ… Vaccination vs. Death Rate: Higher vaccine coverage significantly reduces fatalities.
βœ… ICU Admissions vs. Healthcare Capacity: Countries with fewer ICU beds faced greater strain during peak surges.


Suggestions for Improvement

1. Data-Driven Policy Adjustments

πŸ”Ή Early Testing & Containment: Rapid testing can prevent unchecked outbreaks.
πŸ”Ή Localized Lockdowns: Stringent policies in high-risk areas can reduce case surges.

2. Vaccination Strategy Enhancements

πŸ”Ή Booster Campaigns: Rolling out booster shots in high-risk regions can curb case spikes.
πŸ”Ή Global Equity in Vaccines: Some countries lag behind in vaccine availability, requiring support.

3. Healthcare Preparedness

πŸ”Ή ICU & Hospital Capacity Planning: Investing in healthcare infrastructure can mitigate future crises.
πŸ”Ή Medical Supply Chain Optimization: Ensuring availability of PPE, ventilators, and essential drugs is crucial.

4. Public Awareness & Compliance

πŸ”Ή Masking & Social Distancing Campaigns: Public adherence improves when policies are clearly communicated.
πŸ”Ή Misinformation Control: Governments and health agencies must combat false narratives around COVID-19.

5. Data-Driven Decision Making

πŸ”Ή Real-Time Monitoring Dashboards: Governments and organizations should use dynamic dashboards to track case trends, hospitalizations, and vaccinations.
πŸ”Ή AI & Predictive Analysis: Leveraging machine learning can help predict future outbreaks based on existing patterns.


Final Thoughts

This COVID-19 Analysis Report provides a comprehensive understanding of the pandemic’s impact, highlighting key trends, challenges, and actionable insights. With interactive dashboards, governments, healthcare professionals, and policymakers can make data-driven decisions to better manage future outbreaks.